| Title: | Multivariate Survival Data with Network Structures |
|---|---|
| Description: | Implements a semi-parametric estimation framework combined with a boosting algorithm to marginally estimate the conditional cumulative distribution function of survival times given informative covariates. It then utilizes the graphical lasso method to reconstruct network structures among multivariate time-to-event variables, accommodating both multivariate outcomes measured within a single dataset and survival times integrated from heterogeneous (multi-source) datasets.. |
| Authors: | Li-Pang Chen [aut, cre] |
| Maintainer: | Li-Pang Chen <[email protected]> |
| License: | GPL-3 |
| Version: | 0.1.0 |
| Built: | 2026-06-05 12:16:14 UTC |
| Source: | https://github.com/cran/MSN |
Implements a semi-parameric estimation framework combined with a boosting algorithm to marginally estimate conditional cumulative distribution functions of survival times given informative covariates. Based on these marginal estimates, the graphical lasso method is applied to reconstruct network structures among multivariate survival outcomes.
The MSN package provides robust functions to analyze multivariate time-to-event outcomes while characterizing their underlying network dependencies.
The package features a cross-validated semi-parametric boosting approach to estimate marginal conditional cumulative distribution functions for survival data under covariate adjustment. The functions offer multiple algorithmic options for users, including the specification of boosting steps, learning rates, and constants for thresholding small values in the estimators. Based on the marginalized estimates, the implied covariance structure is determined via pairwise Kendall's tau. The graphical lasso is subsequently implemented to reconstruct network structures, fully supporting data integrated from heterogeneous sources as well as multiple outcomes within a single dataset.
Chen, L.-P.
Maintainer: Li-Pang Chen <[email protected]>
Identifies the network structure among multiple survival times or time-to-event variables, accommodating both multivariate outcomes within a single dataset and data integrated from heterogeneous sources.
network_estimate(Y, delta, X, num = 10, rho = 0.1, max_iter = 2000, learning.rate = 0.5, kappa=0.5, stop_value = 0.001)network_estimate(Y, delta, X, num = 10, rho = 0.1, max_iter = 2000, learning.rate = 0.5, kappa=0.5, stop_value = 0.001)
Y |
A list of multivariate survival times or events across samples, utilized for training the estimated conditional cumulative distribution functions. |
delta |
A list of censoring indicators for multivariate time-to-event outcomes, used for training the estimated conditional cumulative distribution functions. |
X |
A list of covariate matrices for the multiple time-to-event processes, used for training the estimated conditional cumulative distribution functions. |
num |
A positive integer specifying the number of grid points used to evaluate the integral of the estimated cumulative distribution function. |
rho |
A positive tuning parameter controlling the sparsity of the graphical lasso estimator. |
max_iter |
A positive integer indicating the total number of boosting iterations to be executed. |
learning.rate |
A positive step-size parameter (learning rate) that scales the contribution of each boosting iteration. |
kappa |
A positive thresholding constant used to zero out small estimated coefficients in the |
stop_value |
A positive tolerance value used as the convergence criterion for early stopping the boosting procedure. |
This function operates in three sequential steps. It first adopts semi_estimation to compute the marginal cumulative distribution functions for each time-to-event variable. These estimated distributions are then utilized to evaluate Kendall's tau for pairwise outcomes. Finally, the graphical lasso method is implemented to reconstruct the underlying network structure based on the pairwise association matrix.
cov_matrix |
An |
pre_matrix |
An |
num_edge |
A positive number of edges in the estimated graph. |
Chen, L.-P.
library(MASS) p = 5 J = 5 n = 10 X1 = mvrnorm(n, rep(0, p), diag(1, p)) beta = matrix(0, nrow = p, ncol = J) for (j in 1:J) { beta[sample(1:p, 3), j] <- 1 } P = matrix(0,J,J) P[1,2]=1; P[3,4:5]=1 P[2,1]=1; P[4:5,3]=1 diag(P) = max(eigen(P)$value)+0.1 Sigma = cov2cor(solve(P)) Z = mvrnorm(n, mu = rep(0, J), Sigma=Sigma) U = pnorm(Z) event = NULL event[[1]] = -log(1 - U[,1]) / exp(X1 %*% beta[,1]) event[[2]] = (-log(1 - U[,2]) / exp(X1 %*% beta[,2])) event[[3]] = sqrt(-2 * log(1 - U[,3]) / exp(X1%*%beta[,3])) event[[4]] = exp(X1%*%beta[,4] + qnorm(U[,4])) event[[5]] = exp(X1%*%beta[,5] + qnorm(U[,5])) Y = NULL; delta = NULL; X = NULL for(j in 1:J) { Y[[j]] = pmin(event[[j]], rexp(n,1)) delta[[j]] = (event[[j]]< rexp(n,1))*1 X[[j]] = X1 } net_est = network_estimate(Y = Y, delta = delta, X = X, rho = 0.05)library(MASS) p = 5 J = 5 n = 10 X1 = mvrnorm(n, rep(0, p), diag(1, p)) beta = matrix(0, nrow = p, ncol = J) for (j in 1:J) { beta[sample(1:p, 3), j] <- 1 } P = matrix(0,J,J) P[1,2]=1; P[3,4:5]=1 P[2,1]=1; P[4:5,3]=1 diag(P) = max(eigen(P)$value)+0.1 Sigma = cov2cor(solve(P)) Z = mvrnorm(n, mu = rep(0, J), Sigma=Sigma) U = pnorm(Z) event = NULL event[[1]] = -log(1 - U[,1]) / exp(X1 %*% beta[,1]) event[[2]] = (-log(1 - U[,2]) / exp(X1 %*% beta[,2])) event[[3]] = sqrt(-2 * log(1 - U[,3]) / exp(X1%*%beta[,3])) event[[4]] = exp(X1%*%beta[,4] + qnorm(U[,4])) event[[5]] = exp(X1%*%beta[,5] + qnorm(U[,5])) Y = NULL; delta = NULL; X = NULL for(j in 1:J) { Y[[j]] = pmin(event[[j]], rexp(n,1)) delta[[j]] = (event[[j]]< rexp(n,1))*1 X[[j]] = X1 } net_est = network_estimate(Y = Y, delta = delta, X = X, rho = 0.05)
Implements the Bayesian Information Criterion (BIC) to select the optimal tuning parameter for the graphical lasso method.
network_estimate_BIC(Y, delta, X, num = 10, rho_seq = seq(0,0.5,length=10), max_iter = 2000, learning.rate = 0.5, kappa=0.5, stop_value = 0.001)network_estimate_BIC(Y, delta, X, num = 10, rho_seq = seq(0,0.5,length=10), max_iter = 2000, learning.rate = 0.5, kappa=0.5, stop_value = 0.001)
Y |
A list of multivariate survival times or events across samples, utilized for training the estimated conditional cumulative distribution functions. |
delta |
A list of censoring indicators for multivariate time-to-event outcomes, used for training the estimated conditional cumulative distribution functions. |
X |
A list of covariate matrices for the multiple time-to-event processes, used for training the estimated conditional cumulative distribution functions. |
num |
A positive integer specifying the number of grid points used to evaluate the integral of the estimated cumulative distribution function. |
rho_seq |
A user-specified numeric vector representing the candidate sequence of tuning parameters for the graphical lasso. |
max_iter |
A positive integer indicating the total number of boosting iterations to be executed. |
learning.rate |
A positive step-size parameter (learning rate) that scales the contribution of each boosting iteration. |
kappa |
A positive thresholding constant used to zero out small estimated coefficients in the |
stop_value |
A positive tolerance value used as the convergence criterion for early stopping the boosting procedure. |
The function conducts a grid search across candidate tuning parameters to compute their corresponding BIC scores. By balancing model fit against the number of non-zero edges, it determines the optimal tuning parameter for the graphical lasso algorithm. Users can subsequently pass this optimal parameter into network_estimate to perform the final network structure detection and visualization.
bic_seq |
A numeric vector of BIC values corresponding to each candidate tuning parameter specified in |
opt_rho |
The optimal tuning parameter selected via the minimization of the BIC criterion. |
pre_matrix |
An |
num_edge |
A positive number of edges in the estimated graph. |
Chen, L.-P.
library(MASS) p = 5 J = 5 n = 10 X1 = mvrnorm(n, rep(0, p), diag(1, p)) beta = matrix(0, nrow = p, ncol = J) for (j in 1:J) { beta[sample(1:p, 3), j] <- 1 } P = matrix(0,J,J) P[1,2]=1; P[3,4:5]=1 P[2,1]=1; P[4:5,3]=1 diag(P) = max(eigen(P)$value)+0.1 Sigma = cov2cor(solve(P)) Z = mvrnorm(n, mu = rep(0, J), Sigma=Sigma) U = pnorm(Z) event = NULL event[[1]] = -log(1 - U[,1]) / exp(X1 %*% beta[,1]) event[[2]] = (-log(1 - U[,2]) / exp(X1 %*% beta[,2])) event[[3]] = sqrt(-2 * log(1 - U[,3]) / exp(X1%*%beta[,3])) event[[4]] = exp(X1%*%beta[,4] + qnorm(U[,4])) event[[5]] = exp(X1%*%beta[,5] + qnorm(U[,5])) Y = NULL; delta = NULL; X = NULL for(j in 1:J) { Y[[j]] = pmin(event[[j]], rexp(n,1)) delta[[j]] = (event[[j]]< rexp(n,1))*1 X[[j]] = X1 } net_est_bic = network_estimate_BIC(Y = Y, delta = delta, X = X, rho_seq = seq(0,0.8,length=5))library(MASS) p = 5 J = 5 n = 10 X1 = mvrnorm(n, rep(0, p), diag(1, p)) beta = matrix(0, nrow = p, ncol = J) for (j in 1:J) { beta[sample(1:p, 3), j] <- 1 } P = matrix(0,J,J) P[1,2]=1; P[3,4:5]=1 P[2,1]=1; P[4:5,3]=1 diag(P) = max(eigen(P)$value)+0.1 Sigma = cov2cor(solve(P)) Z = mvrnorm(n, mu = rep(0, J), Sigma=Sigma) U = pnorm(Z) event = NULL event[[1]] = -log(1 - U[,1]) / exp(X1 %*% beta[,1]) event[[2]] = (-log(1 - U[,2]) / exp(X1 %*% beta[,2])) event[[3]] = sqrt(-2 * log(1 - U[,3]) / exp(X1%*%beta[,3])) event[[4]] = exp(X1%*%beta[,4] + qnorm(U[,4])) event[[5]] = exp(X1%*%beta[,5] + qnorm(U[,5])) Y = NULL; delta = NULL; X = NULL for(j in 1:J) { Y[[j]] = pmin(event[[j]], rexp(n,1)) delta[[j]] = (event[[j]]< rexp(n,1))*1 X[[j]] = X1 } net_est_bic = network_estimate_BIC(Y = Y, delta = delta, X = X, rho_seq = seq(0,0.8,length=5))
Implements a gradient boosting algorithm to select informative covariates and marginally estimate the conditional cumulative distribution functions of survival times. Based on the fitted model, this function further evaluates predicted cumulative distribution values for user-specified survival times and covariate profiles.
semi_estimation(y_vec, X_target, Y_train, Delta_train, X_train, max_iter = 2000, learning.rate = 0.5, kappa=0.5, stop_value = 0.001)semi_estimation(y_vec, X_target, Y_train, Delta_train, X_train, max_iter = 2000, learning.rate = 0.5, kappa=0.5, stop_value = 0.001)
y_vec |
A user-specified numeric vector or matrix at which the estimated conditional cumulative distribution function is to be evaluated. |
X_target |
A user-specified matrix or data frame of new covariate profiles, used for evaluating the estimated conditional cumulative distribution function. |
Y_train |
A list of multivariate survival times or events across samples, utilized for training the estimated conditional cumulative distribution functions. |
Delta_train |
A list of censoring indicators for multivariate time-to-event outcomes, used for training the estimated conditional cumulative distribution functions. |
X_train |
A list of covariate matrices for the multiple time-to-event processes, used for training the estimated conditional cumulative distribution functions. |
max_iter |
A positive integer indicating the total number of boosting iterations to be executed. |
learning.rate |
A positive step-size parameter (learning rate) that scales the contribution of each boosting iteration. |
kappa |
A positive thresholding constant used to zero out small estimated coefficients in the |
stop_value |
A positive tolerance value used as the convergence criterion for early stopping the boosting procedure. |
The function executes a semi-parametric boosting approach to identify critical risk factors and marginally estimate the conditional cumulative distribution function. After finalizing the optimization process, the resulting estimator can be further queried by the user. By passing designated survival times and new covariate matrices into the corresponding arguments, the function computes the predicted marginal cumulative probabilities, facilitating flexible post-estimation analysis.
est_beta |
A |
est_F |
A numeric matrix of the estimated conditional cumulative distribution function values, evaluated at the specified survival time-points |
Chen, L.-P.
library(MASS) p = 5 J = 5 n = 10 X1 = mvrnorm(n, rep(0, p), diag(1, p)) beta = matrix(0, nrow = p, ncol = J) for (j in 1:J) { beta[sample(1:p, 3), j] <- 1 } P = matrix(0,J,J) P[1,2]=1; P[3,4:5]=1 P[2,1]=1; P[4:5,3]=1 diag(P) = max(eigen(P)$value)+0.1 Sigma = cov2cor(solve(P)) Z = mvrnorm(n, mu = rep(0, J), Sigma=Sigma) U = pnorm(Z) event = NULL event[[1]] = -log(1 - U[,1]) / exp(X1 %*% beta[,1]) event[[2]] = (-log(1 - U[,2]) / exp(X1 %*% beta[,2])) event[[3]] = sqrt(-2 * log(1 - U[,3]) / exp(X1%*%beta[,3])) event[[4]] = exp(X1%*%beta[,4] + qnorm(U[,4])) event[[5]] = exp(X1%*%beta[,5] + qnorm(U[,5])) Y = NULL; delta = NULL; X = NULL for(j in 1:J) { Y[[j]] = pmin(event[[j]], rexp(n,1)) delta[[j]] = (event[[j]]< rexp(n,1))*1 X[[j]] = X1 } semi_est = semi_estimation(y_vec = seq(0.1,max(Y[[J]]),length=10), X_target = X[[J]], Y_train = Y[[J]], Delta_train = delta[[J]], X_train = X[[J]])library(MASS) p = 5 J = 5 n = 10 X1 = mvrnorm(n, rep(0, p), diag(1, p)) beta = matrix(0, nrow = p, ncol = J) for (j in 1:J) { beta[sample(1:p, 3), j] <- 1 } P = matrix(0,J,J) P[1,2]=1; P[3,4:5]=1 P[2,1]=1; P[4:5,3]=1 diag(P) = max(eigen(P)$value)+0.1 Sigma = cov2cor(solve(P)) Z = mvrnorm(n, mu = rep(0, J), Sigma=Sigma) U = pnorm(Z) event = NULL event[[1]] = -log(1 - U[,1]) / exp(X1 %*% beta[,1]) event[[2]] = (-log(1 - U[,2]) / exp(X1 %*% beta[,2])) event[[3]] = sqrt(-2 * log(1 - U[,3]) / exp(X1%*%beta[,3])) event[[4]] = exp(X1%*%beta[,4] + qnorm(U[,4])) event[[5]] = exp(X1%*%beta[,5] + qnorm(U[,5])) Y = NULL; delta = NULL; X = NULL for(j in 1:J) { Y[[j]] = pmin(event[[j]], rexp(n,1)) delta[[j]] = (event[[j]]< rexp(n,1))*1 X[[j]] = X1 } semi_est = semi_estimation(y_vec = seq(0.1,max(Y[[J]]),length=10), X_target = X[[J]], Y_train = Y[[J]], Delta_train = delta[[J]], X_train = X[[J]])