Title: | Average Treatment Effects with Measurement Error and Variable Selection for Confounders |
---|---|
Description: | A recent method proposed by Yi and Chen (2023) <doi:10.1177/09622802221146308> is used to estimate the average treatment effects using noisy data containing both measurement error and spurious variables. The package 'AteMeVs' contains a set of functions that provide a step-by-step estimation procedure, including the correction of the measurement error effects, variable selection for building the model used to estimate the propensity scores, and estimation of the average treatment effects. The functions contain multiple options for users to implement, including different ways to correct for the measurement error effects, distinct choices of penalty functions to do variable selection, and various regression models to characterize propensity scores. |
Authors: | Li-Pang Chen [aut, cre], Grace Yi [aut] |
Maintainer: | Li-Pang Chen <[email protected]> |
License: | GPL-2 |
Version: | 0.1.0 |
Built: | 2024-10-29 03:29:52 UTC |
Source: | https://github.com/cran/AteMeVs |
A recent method proposed by Yi and Chen (2023) <doi:10.1177/09622802221146308> is implemented to estimate the average treatment effects using noisy data containing both measurement error and spurious variables.
The R package 'AteMeVs', which refers to estimation of the Average Treatment Effects with Measurement Error and Variable Selection for confounders, contains a set of functions that provide a step-by-step estimation procedure, including the correction of the measurement error effects, variable selection for building the model used to estimate the propensity scores, and estimation of the average treatment effects. The functions contain multiple options for users to implement, including different ways to correct for the measurement error effects, distinct choices of penalty functions to do variable selection, and various regression models to characterize propensity scores.
Chen, L.-P. and Yi, G. Y.
Maintainer: Li-Pang Chen <[email protected]>
Yi, G. Y. and Chen, L.-P. (2023). Estimation of the average treatment effect with variable selection and measurement error simultaneously addressed for potential confounders. Statistical Methods in Medical Research, 32, 691-711.
This function is used to generate an artificial dataset, which contains potential outcomes, treatments, and error-prone confounders.
DG(X,Z,gamma_X,gamma_Z,Sigma_e,outcome="continuous")
DG(X,Z,gamma_X,gamma_Z,Sigma_e,outcome="continuous")
X |
an |
Z |
an |
gamma_X |
a |
gamma_Z |
a |
Sigma_e |
a |
outcome |
the indicator of the nature of the outcome variable; |
This function is used to generate artificial data, including potential outcomes, binary treatments, and error-prone and precisely measured confounders.
data |
an |
Chen, L.-P. and Yi, G. Y.
Yi, G. Y. and Chen, L.-P. (2023). Estimation of the average treatment effect with variable selection and measurement error simultaneously addressed for potential confounders. Statistical Methods in Medical Research, 32, 691-711.
library(MASS) n = 800 p_x = 10 # dimension of parameters p_z = 10 p = p_x + p_z gamma_X = c(rep(1,2),rep(0,p_x-2)) gamma_Z = c(rep(1,2),rep(0,p_z-2)) gamma = c(gamma_X, gamma_Z) mu_X = rep(0,p_x) mu_Z = rep(0,p_z) Sigma_X = diag(1,p_x,p_x) Sigma_Z = diag(1,p_z,p_z) Sigma_e = diag(0.2,p_x) X = mvrnorm(n, mu_X, Sigma_X, tol = 1e-6, empirical = FALSE, EISPACK = FALSE) Z = mvrnorm(n, mu_Z, Sigma_Z, tol = 1e-6, empirical = FALSE, EISPACK = FALSE) data = DG(X,Z,gamma_X,gamma_Z,Sigma_e,outcome="continuous")
library(MASS) n = 800 p_x = 10 # dimension of parameters p_z = 10 p = p_x + p_z gamma_X = c(rep(1,2),rep(0,p_x-2)) gamma_Z = c(rep(1,2),rep(0,p_z-2)) gamma = c(gamma_X, gamma_Z) mu_X = rep(0,p_x) mu_Z = rep(0,p_z) Sigma_X = diag(1,p_x,p_x) Sigma_Z = diag(1,p_z,p_z) Sigma_e = diag(0.2,p_x) X = mvrnorm(n, mu_X, Sigma_X, tol = 1e-6, empirical = FALSE, EISPACK = FALSE) Z = mvrnorm(n, mu_Z, Sigma_Z, tol = 1e-6, empirical = FALSE, EISPACK = FALSE) data = DG(X,Z,gamma_X,gamma_Z,Sigma_e,outcome="continuous")
This function is used to estimate the average treatment effect by implementing the simulation and extrapolation (SIMEX) method with informative and error-eliminated confounders accommodated.
EST_ATE(data, PS="logistic", Psi=seq(0,1,length=10), K=200, gamma,p_x=p, extrapolate="quadratic", Sigma_e, replicate = "FALSE", RM = 0, bootstrap = 100)
EST_ATE(data, PS="logistic", Psi=seq(0,1,length=10), K=200, gamma,p_x=p, extrapolate="quadratic", Sigma_e, replicate = "FALSE", RM = 0, bootstrap = 100)
data |
an |
PS |
the specification of a link function in the treatment model. |
Psi |
a user-specified sequence of non-negative values taken from an interval. The default is set as |
p_x |
the dimension of the error-prone confounders |
K |
a user-specified positive integer. The default is 200. |
gamma |
a vector of estimators for the treatment model, which is derived by using VSE_PS. |
extrapolate |
the extrapolation function in Step 3. |
Sigma_e |
the covariance matrix for the measurement error model |
replicate |
the indicator for the availability of repeated measurements in the confounders. |
RM |
a |
bootstrap |
a user-specified positive integer representing the number of generated bootstrap samples to be applied with the estimation procedure |
This function is used to implement the simulation and extrapolation (SIMEX) method with informative confounders accommodated to to estimate the average treatment effect.
estimate |
a point estimate of the average treatment effect |
variance |
a variance estimate associated with the estimate of the average treatment effect |
p-value |
the resulting p-value of the average treatment effect |
Chen, L.-P. and Yi, G. Y.
Yi, G. Y. and Chen, L.-P. (2023). Estimation of the average treatment effect with variable selection and measurement error simultaneously addressed for potential confounders. Statistical Methods in Medical Research, 32, 691-711.
library(MASS) n = 800 p_x = 10 # dimension of parameters p_z = 10 p = p_x + p_z gamma_X = c(rep(1,2),rep(0,p_x-2)) gamma_Z = c(rep(1,2),rep(0,p_z-2)) gamma = c(gamma_X, gamma_Z) mu_X = rep(0,p_x) mu_Z = rep(0,p_z) Sigma_X = diag(1,p_x,p_x) Sigma_Z = diag(1,p_z,p_z) Sigma_e = diag(0.2,p_x) X = mvrnorm(n, mu_X, Sigma_X, tol = 1e-6, empirical = FALSE, EISPACK = FALSE) Z = mvrnorm(n, mu_Z, Sigma_Z, tol = 1e-6, empirical = FALSE, EISPACK = FALSE) data = DG(X,Z,gamma_X,gamma_Z,Sigma_e,outcome="continuous") y = as.vector(SIMEX_EST(data,PS="logistic",Psi = seq(0,2,length=10),p_x=length(gamma_X),K=5, Sigma_e=diag(0.2,p_x))) V = diag(1,length(y),length(y)) est_lasso_cv = VSE_PS(V,y,method="lasso",cv="TRUE") EST_ATE(data, Psi = seq(0,2,length=10),p_x=length(gamma_X),K=5, gamma=est_lasso_cv, Sigma_e=diag(0.2,p_x),bootstrap = 10) est_scad_cv = VSE_PS(V,y,method="scad",cv="TRUE") EST_ATE(data, Psi = seq(0,2,length=10),p_x=length(gamma_X),K=5, gamma=est_scad_cv, Sigma_e=diag(0.2,p_x),bootstrap = 10) est_mcp_cv = VSE_PS(V,y,method="mcp",cv="TRUE") EST_ATE(data, Psi = seq(0,2,length=10),p_x=length(gamma_X),K=5, gamma=est_mcp_cv, Sigma_e=diag(0.2,p_x),bootstrap = 10)
library(MASS) n = 800 p_x = 10 # dimension of parameters p_z = 10 p = p_x + p_z gamma_X = c(rep(1,2),rep(0,p_x-2)) gamma_Z = c(rep(1,2),rep(0,p_z-2)) gamma = c(gamma_X, gamma_Z) mu_X = rep(0,p_x) mu_Z = rep(0,p_z) Sigma_X = diag(1,p_x,p_x) Sigma_Z = diag(1,p_z,p_z) Sigma_e = diag(0.2,p_x) X = mvrnorm(n, mu_X, Sigma_X, tol = 1e-6, empirical = FALSE, EISPACK = FALSE) Z = mvrnorm(n, mu_Z, Sigma_Z, tol = 1e-6, empirical = FALSE, EISPACK = FALSE) data = DG(X,Z,gamma_X,gamma_Z,Sigma_e,outcome="continuous") y = as.vector(SIMEX_EST(data,PS="logistic",Psi = seq(0,2,length=10),p_x=length(gamma_X),K=5, Sigma_e=diag(0.2,p_x))) V = diag(1,length(y),length(y)) est_lasso_cv = VSE_PS(V,y,method="lasso",cv="TRUE") EST_ATE(data, Psi = seq(0,2,length=10),p_x=length(gamma_X),K=5, gamma=est_lasso_cv, Sigma_e=diag(0.2,p_x),bootstrap = 10) est_scad_cv = VSE_PS(V,y,method="scad",cv="TRUE") EST_ATE(data, Psi = seq(0,2,length=10),p_x=length(gamma_X),K=5, gamma=est_scad_cv, Sigma_e=diag(0.2,p_x),bootstrap = 10) est_mcp_cv = VSE_PS(V,y,method="mcp",cv="TRUE") EST_ATE(data, Psi = seq(0,2,length=10),p_x=length(gamma_X),K=5, gamma=est_mcp_cv, Sigma_e=diag(0.2,p_x),bootstrap = 10)
This function employs the simulation and extrapolation (SIMEX) method to correct for the measurement error effects for confounders in the treatment model.
SIMEX_EST(data, PS="logistic", Psi=seq(0,1,length=10),p_x=p, K=200, extrapolate="quadratic", Sigma_e, replicate = "FALSE", RM = 0)
SIMEX_EST(data, PS="logistic", Psi=seq(0,1,length=10),p_x=p, K=200, extrapolate="quadratic", Sigma_e, replicate = "FALSE", RM = 0)
data |
an |
PS |
the specification of a link function in the treatment model. |
Psi |
a user-specified sequence of non-negative values taken from an interval. The default is set as |
p_x |
the dimension of the error-prone confounders |
K |
a user-specified positive integer, with the default value set as 200 |
extrapolate |
the extrapolation function in Step 3. |
Sigma_e |
the covariance matrix for the measurement error model |
replicate |
the indicator for the availability of repeated measurements in the confounders. |
RM |
a |
This function is used to implement the simulation and extrapolation (SIMEX) method to estimate parameters in the treatment model.
a vector of estimators in the treatment model
Chen, L.-P. and Yi, G. Y.
Yi, G. Y. and Chen, L.-P. (2023). Estimation of the average treatment effect with variable selection and measurement error simultaneously addressed for potential confounders. Statistical Methods in Medical Research, 32, 691-711.
library(MASS) n = 800 p_x = 10 # dimension of parameters p_z = 10 p = p_x + p_z gamma_X = c(rep(1,2),rep(0,p_x-2)) gamma_Z = c(rep(1,2),rep(0,p_z-2)) gamma = c(gamma_X, gamma_Z) mu_X = rep(0,p_x) mu_Z = rep(0,p_z) Sigma_X = diag(1,p_x,p_x) Sigma_Z = diag(1,p_z,p_z) Sigma_e = diag(0.2,p_x) X = mvrnorm(n, mu_X, Sigma_X, tol = 1e-6, empirical = FALSE, EISPACK = FALSE) Z = mvrnorm(n, mu_Z, Sigma_Z, tol = 1e-6, empirical = FALSE, EISPACK = FALSE) data = DG(X,Z,gamma_X,gamma_Z,Sigma_e,outcome="continuous") y = SIMEX_EST(data,PS="logistic",Psi = seq(0,2,length=10),p_x =length(gamma_X), K=5, Sigma_e=diag(0.2,p_x))
library(MASS) n = 800 p_x = 10 # dimension of parameters p_z = 10 p = p_x + p_z gamma_X = c(rep(1,2),rep(0,p_x-2)) gamma_Z = c(rep(1,2),rep(0,p_z-2)) gamma = c(gamma_X, gamma_Z) mu_X = rep(0,p_x) mu_Z = rep(0,p_z) Sigma_X = diag(1,p_x,p_x) Sigma_Z = diag(1,p_z,p_z) Sigma_e = diag(0.2,p_x) X = mvrnorm(n, mu_X, Sigma_X, tol = 1e-6, empirical = FALSE, EISPACK = FALSE) Z = mvrnorm(n, mu_Z, Sigma_Z, tol = 1e-6, empirical = FALSE, EISPACK = FALSE) data = DG(X,Z,gamma_X,gamma_Z,Sigma_e,outcome="continuous") y = SIMEX_EST(data,PS="logistic",Psi = seq(0,2,length=10),p_x =length(gamma_X), K=5, Sigma_e=diag(0.2,p_x))
This function implements the penalized quadratic loss function to select the informative confounders.
VSE_PS(V,y,method="lasso",cv="TRUE",alpha=1)
VSE_PS(V,y,method="lasso",cv="TRUE",alpha=1)
V |
a user-specified matrix in the quadratic loss function |
y |
a vector determined by SIMEX_EST |
method |
it specifies a choice of the penalty function with options |
cv |
the usage for choosing the tuning parameter. |
alpha |
the constant appearing in the Elastic Net penalty (Zou and Hastie 2005). The default value is 1. |
This function is used to do variable selection for informative confounders by various choices of penalty functions.
a vector of estimators in the treatment model, where components with zero values represent confounders that are unimportant and need to excluded; components with nonzero values identify important confounders that enter the treatment model.
Chen, L.-P. and Yi, G. Y.
Fan, J. and Li, R. (2001). Variable selection via nonconcave penalized likelihood and its oracle properties. Journal of the American Statistical Association, 96, 1348-1360.
Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society, Series B, 58, 267-288.
Yi, G. Y. and Chen, L.-P. (2023). Estimation of the average treatment effect with variable selection and measurement error simultaneously addressed for potential confounders. Statistical Methods in Medical Research, 32, 691-711.
Zhang, C.-H. (2010). Nearly unbiased variable selection under minimax concave penalty. The Annals of Statistics, 38, 894-942.
Zou, H., and Hastie, T. (2005). Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society, Series B, 67, 301-320.
library(MASS) n = 800 p_x = 10 # dimension of parameters p_z = 10 p = p_x + p_z gamma_X = c(rep(1,2),rep(0,p_x-2)) gamma_Z = c(rep(1,2),rep(0,p_z-2)) gamma = c(gamma_X, gamma_Z) mu_X = rep(0,p_x) mu_Z = rep(0,p_z) Sigma_X = diag(1,p_x,p_x) Sigma_Z = diag(1,p_z,p_z) Sigma_e = diag(0.2,p_x) X = mvrnorm(n, mu_X, Sigma_X, tol = 1e-6, empirical = FALSE, EISPACK = FALSE) Z = mvrnorm(n, mu_Z, Sigma_Z, tol = 1e-6, empirical = FALSE, EISPACK = FALSE) data = DG(X,Z,gamma_X,gamma_Z,Sigma_e,outcome="continuous") y = as.vector(SIMEX_EST(data,PS="logistic",Psi = seq(0,2,length=10),p_x=length(gamma_X), K=5, Sigma_e=diag(0.2,p_x))) V = diag(1,length(y),length(y)) VSE_PS(V,y,method="lasso",cv="TRUE") VSE_PS(V,y,method="scad",cv="TRUE") VSE_PS(V,y,method="mcp",cv="TRUE")
library(MASS) n = 800 p_x = 10 # dimension of parameters p_z = 10 p = p_x + p_z gamma_X = c(rep(1,2),rep(0,p_x-2)) gamma_Z = c(rep(1,2),rep(0,p_z-2)) gamma = c(gamma_X, gamma_Z) mu_X = rep(0,p_x) mu_Z = rep(0,p_z) Sigma_X = diag(1,p_x,p_x) Sigma_Z = diag(1,p_z,p_z) Sigma_e = diag(0.2,p_x) X = mvrnorm(n, mu_X, Sigma_X, tol = 1e-6, empirical = FALSE, EISPACK = FALSE) Z = mvrnorm(n, mu_Z, Sigma_Z, tol = 1e-6, empirical = FALSE, EISPACK = FALSE) data = DG(X,Z,gamma_X,gamma_Z,Sigma_e,outcome="continuous") y = as.vector(SIMEX_EST(data,PS="logistic",Psi = seq(0,2,length=10),p_x=length(gamma_X), K=5, Sigma_e=diag(0.2,p_x))) V = diag(1,length(y),length(y)) VSE_PS(V,y,method="lasso",cv="TRUE") VSE_PS(V,y,method="scad",cv="TRUE") VSE_PS(V,y,method="mcp",cv="TRUE")