Targeted Maximum Likelihood Estimation of Parameter of MSM
Targeted maximum likelihood estimation of the parameter of a marginal structural model (MSM) for binary point treatment effects. The tmleMSM
function is minimally called with arguments (Y,A,W, MSM)
, where Y
is a continuous or binary outcome variable, A
is a binary treatment variable, (A=1
for treatment, A=0
for control), and W
is a matrix or dataframe of baseline covariates. MSM is a valid regression formula for regressing Y
on any combination of A, V, W, T
, where V
defines strata and T
represents the time at which repeated measures on subjects are made. Missingness in the outcome is accounted for in the estimation procedure if missingness indicator Delta
is 0 for some observations. Repeated measures can be identified using the id
argument.
tmleMSM(Y, A, W, V, T = rep(1,length(Y)), Delta = rep(1, length(Y)), MSM, v = NULL, Q = NULL, Qform = NULL, Qbounds = c(-Inf, Inf), Q.SL.library = c("SL.glm", "tmle.SL.dbarts2", "SL.glmnet"), cvQinit = TRUE, hAV = NULL, hAVform = NULL, g1W = NULL, gform = NULL, pDelta1 = NULL, g.Deltaform = NULL, g.SL.library = c("SL.glm", "tmle.SL.dbarts.k.5", "SL.gam"), ub = 1/0.025, family = "gaussian", fluctuation = "logistic", alpha = 0.995, id = 1:length(Y), V_SL = 5, inference = TRUE, verbose = FALSE, Q.discreteSL = FALSE, g.discreteSL = FALSE)
Y |
continuous or binary outcome variable |
A |
binary treatment indicator, |
W |
vector, matrix, or dataframe containing baseline covariates. Factors are not currently allowed. |
V |
vector, matrix, or dataframe of covariates used to define strata |
T |
optional time for repeated measures data |
Delta |
indicator of missing outcome or treatment assignment. |
MSM |
MSM of interest, specified as valid right hand side of a regression formula (see examples) |
v |
optional value defining the strata of interest (V=v) for stratified estimation of MSM parameter |
Q |
optional nx2 matrix of initial values for Q portion of the likelihood, (E(Y|A=0,W), E(Y|A=1,W)) |
Qform |
optional regression formula for estimation of E(Y|A, W), suitable for call to |
Qbounds |
vector of upper and lower bounds on |
Q.SL.library |
optional vector of prediction algorithms to use for |
cvQinit |
logical, if |
hAV |
optional nx2 matrix used in numerator of weights for updating covariate and the influence curve. If unspecified, defaults to conditional probabilities P(A=1|V) or P(A=1|T), for repeated measures data. For unstabilized weights, pass in an nx2 matrix of all 1s |
hAVform |
optionalregression formula of the form |
g1W |
optional vector of conditional treatment assingment probabilities, P(A=1|W) |
gform |
optional regression formula of the form |
pDelta1 |
optional nx2 matrix of conditional probabilities for missingness mechanism,P(Delta=1|A=0,V,W,T), P(Delta=1|A=1,V,W,T). |
g.Deltaform |
optional regression formula of the form |
g.SL.library |
optional vector of prediction algorithms to use for |
ub |
upper bound on observation weights. See |
family |
family specification for working regression models, generally ‘gaussian’ for continuous outcomes (default), ‘binomial’ for binary outcomes |
fluctuation |
‘logistic’ (default), or ‘linear’ |
alpha |
used to keep predicted initial values bounded away from (0,1) for logistic fluctuation |
id |
optional subject identifier |
V_SL |
number of cross-validation folds for Super Learner estimation of Q and g |
inference |
if |
verbose |
status messages printed if set to |
Q.discreteSL |
If true, use discrete SL to estimate Q, otherwise ensembleSL by default. Ignored when SL is not used. |
g.discreteSL |
If true, use discrete SL to estimate components of g, otherwise ensembleSL by default. Ignored when SL is not used. |
ub
bounds the IC by bounding the factor h(A,V)/[g(A,V,W)P(Delta=1|A,V,W)] between 0 and ub
, default value = 1/0.025.
psi |
MSM parameter estimate |
sigma |
variance covariance matrix |
se |
standard errors extracted from sigma |
pvalue |
two-sided p-value |
lb |
lower bound on 95% confidence interval |
ub |
upper bound on 95% confidence interval |
epsilon |
fitted value of epsilon used to target initial |
psi.Qinit |
MSM parameter estimate based on untargeted initial |
Qstar |
targeted estimate of |
Qinit |
initial estimate of |
g |
treatment mechanism estimate. A list with three items: |
g.AV |
estimate for h(A,V) or h(A,T). A list with three items: |
g_Delta |
missingness mechanism estimate. A list with three items: |
Susan Gruber sgruber@cal.berkeley.edu, in collaboration with Mark van der Laan.
1. Gruber, S. and van der Laan, M.J. (2012), tmle: An R Package for Targeted Maximum Likelihood Estimation. Journal of Statistical Software, 51(13), 1-35. http://www.jstatsoft.org/v51/i13/
2. Rosenblum, M. and van der Laan, M.J. (2010), Targeted Maximum Likelihood Estimation of the Parameter of a Marginal Structural Model. The International Journal of Biostatistics,6(2), 2010.
library(tmle) # Example 1. Estimating MSM parameter with correctly specified regression formulas # MSM: psi0 + psi1*A + psi2*V + psi3*A*V (saturated) # true parameter value: psi = (0, 1, -2, 0.5) # generate data set.seed(100) n <- 1000 W <- matrix(rnorm(n*3), ncol = 3) colnames(W) <- c("W1", "W2", "W3") V <- rbinom(n, 1, 0.5) A <- rbinom(n, 1, 0.5) Y <- rbinom(n, 1, plogis(A - 2*V + 0.5*A*V)) result.ex1 <- tmleMSM(Y, A, W, V, MSM = "A*V", Qform = Y~., gform = A~1, hAVform = A~1, family = "binomial") print(result.ex1) ## Not run: # Example 2. Repeated measures data, two observations per id # (e.g., crossover study design) # MSM: psi0 + psi1*A + psi2*V + psi3*V^2 + psi4*T # true parameter value: psi = (-2, 1, 0, -2, 0 ) # generate data in wide format (id, W1, Y(t), W2(t), V(t), A(t)) set.seed(10) n <- 250 id <- rep(1:n) W1 <- rbinom(n, 1, 0.5) W2.1 <- rnorm(n) W2.2 <- rnorm(n) V.1 <- rnorm(n) V.2 <- rnorm(n) A.1 <- rbinom(n, 1, plogis(0.5 + 0.3 * W2.1)) A.2 <- 1-A.1 Y.1 <- -2 + A.1 - 2*V.1^2 + W2.1 + rnorm(n) Y.2 <- -2 + A.2 - 2*V.2^2 + W2.2 + rnorm(n) d <- data.frame(id, W1, W2=W2.1, W2.2, V=V.1, V.2, A=A.1, A.2, Y=Y.1, Y.2) # change dataset from wide to long format longd <- reshape(d, varying = cbind(c(3, 5, 7, 9), c(4, 6, 8, 10)), idvar = "id", direction = "long", timevar = "T", new.row.names = NULL, sep = "") # misspecified model for initial Q, partial misspecification for g. # V_SL set to 2 to save time, not recommended at this sample size result.ex2 <- tmleMSM(Y = longd$Y, A = longd$A, W = longd[,c("W1", "W2")], V = longd$V, T = longd$T, MSM = "A + V + I(V^2) + T", Qform = Y ~ A + V, gform = A ~ W2, id = longd$id, V_SL=2) print(result.ex2) # Example 3: Introduce 20 # V_SL set to 2 to save time, not recommended at this sample size Delta <- rbinom(nrow(longd), 1, 0.8) result.ex3 <- tmleMSM(Y = longd$Y, A = longd$A, W = longd[,c("W1", "W2")], V = longd$V, T=longd$T, Delta = Delta, MSM = "A + V + I(V^2) + T", Qform = Y ~ A + V, gform = A ~ W2, g.Deltaform = Delta~ 1, id=longd$id, verbose = TRUE, V_SL=2) print(result.ex3) ## End(Not run)
Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.