tmle: tmleMSM – R documentation

Become an expert in R — Interactive courses, Cheat Sheets, certificates and more!

tmleMSM

Targeted Maximum Likelihood Estimation of Parameter of MSM

Description

Targeted maximum likelihood estimation of the parameter of a marginal structural model (MSM) for binary point treatment effects. The tmleMSM function is minimally called with arguments (Y,A,W, MSM), where Y is a continuous or binary outcome variable, A is a binary treatment variable, (A=1 for treatment, A=0 for control), and W is a matrix or dataframe of baseline covariates. MSM is a valid regression formula for regressing Y on any combination of A, V, W, T, where V defines strata and T represents the time at which repeated measures on subjects are made. Missingness in the outcome is accounted for in the estimation procedure if missingness indicator Delta is 0 for some observations. Repeated measures can be identified using the id argument.

Usage

tmleMSM(Y, A, W, V, T = rep(1,length(Y)), Delta = rep(1, length(Y)), MSM, 
        v = NULL, Q = NULL, Qform = NULL, Qbounds = c(-Inf, Inf), 
        Q.SL.library = c("SL.glm", "tmle.SL.dbarts2", "SL.glmnet"), 
        cvQinit = TRUE, hAV = NULL, hAVform = NULL, g1W = NULL, 
        gform = NULL, pDelta1 = NULL, g.Deltaform = NULL, 
	g.SL.library = c("SL.glm", "tmle.SL.dbarts.k.5", "SL.gam"),
	ub = 1/0.025, family = "gaussian", fluctuation = "logistic", 
        alpha  = 0.995, id = 1:length(Y), V_SL = 5, inference = TRUE, 
        verbose = FALSE, Q.discreteSL = FALSE, g.discreteSL = FALSE)

Arguments

`Y`	continuous or binary outcome variable
`A`	binary treatment indicator, `1` - treatment, `0` - control
`W`	vector, matrix, or dataframe containing baseline covariates. Factors are not currently allowed.
`V`	vector, matrix, or dataframe of covariates used to define strata
`T`	optional time for repeated measures data
`Delta`	indicator of missing outcome or treatment assignment. `1` - observed, `0` - missing
`MSM`	MSM of interest, specified as valid right hand side of a regression formula (see examples)
`v`	optional value defining the strata of interest (V=v) for stratified estimation of MSM parameter
`Q`	optional nx2 matrix of initial values for Q portion of the likelihood, (E(Y\|A=0,W), E(Y\|A=1,W))
`Qform`	optional regression formula for estimation of E(Y\|A, W), suitable for call to `glm`
`Qbounds`	vector of upper and lower bounds on `Y` and predicted values for initial `Q`
`Q.SL.library`	optional vector of prediction algorithms to use for `SuperLearner` estimation of initial `Q`
`cvQinit`	logical, if `TRUE`, estimates cross-validated predicted values using discrete super learning, default=`TRUE`
`hAV`	optional nx2 matrix used in numerator of weights for updating covariate and the influence curve. If unspecified, defaults to conditional probabilities P(A=1\|V) or P(A=1\|T), for repeated measures data. For unstabilized weights, pass in an nx2 matrix of all 1s
`hAVform`	optionalregression formula of the form `A~V+T`, if specified this overrides the call to `SuperLearner`
`g1W`	optional vector of conditional treatment assingment probabilities, P(A=1\|W)
`gform`	optional regression formula of the form `A~W`, if specified this overrides the call to `SuperLearner`
`pDelta1`	optional nx2 matrix of conditional probabilities for missingness mechanism,P(Delta=1\|A=0,V,W,T), P(Delta=1\|A=1,V,W,T).
`g.Deltaform`	optional regression formula of the form `Delta~A+W`, if specified this overrides the call to `SuperLearner`
`g.SL.library`	optional vector of prediction algorithms to use for `SuperLearner` estimation of `g1W` or `pDelta1`
`ub`	upper bound on observation weights. See `Details` section for more information
`family`	family specification for working regression models, generally ‘gaussian’ for continuous outcomes (default), ‘binomial’ for binary outcomes
`fluctuation`	‘logistic’ (default), or ‘linear’
`alpha`	used to keep predicted initial values bounded away from (0,1) for logistic fluctuation
`id`	optional subject identifier
`V_SL`	number of cross-validation folds for Super Learner estimation of Q and g
`inference`	if `TRUE`, variance-covariance matrix, standard errors, pvalues, and 95% confidence intervals are calculated. Setting to FALSE saves a little time when bootstrapping.
`verbose`	status messages printed if set to `TRUE` (default=`FALSE`)
`Q.discreteSL`	If true, use discrete SL to estimate Q, otherwise ensembleSL by default. Ignored when SL is not used.
`g.discreteSL`	If true, use discrete SL to estimate components of g, otherwise ensembleSL by default. Ignored when SL is not used.

Details

ub bounds the IC by bounding the factor h(A,V)/[g(A,V,W)P(Delta=1|A,V,W)] between 0 and ub, default value = 1/0.025.

Value

`psi`	MSM parameter estimate
`sigma`	variance covariance matrix
`se`	standard errors extracted from sigma
`pvalue`	two-sided p-value
`lb`	lower bound on 95% confidence interval
`ub`	upper bound on 95% confidence interval
`epsilon`	fitted value of epsilon used to target initial `Q`
`psi.Qinit`	MSM parameter estimate based on untargeted initial `Q`
`Qstar`	targeted estimate of `Q`, an nx2 matrix with predicted values for `Q(0,W), Q(1,W)` using the updated fit
`Qinit`	initial estimate of `Q`. `Qinit$coef` are the coefficients for a `glm` model for `Q`, if applicable. `Qinit$Q` is an nx2 matrix, where `n` is the number of observations. Columns contain predicted values for `Q(0,W),Q(1,W)` using the initial fit. `Qinit$type` is method for estimating `Q`
`g`	treatment mechanism estimate. A list with three items: `g$g1W` contains estimates of P(A=1\|W) for each observation, `g$coef` the coefficients for the model for g when `glm` used, `g$type` estimation procedure
`g.AV`	estimate for h(A,V) or h(A,T). A list with three items: `g.AV$g1W` an nx2 matrix containing values of P(A=0\|V,T), P(A=1\|V,T) for each observation, `g.AV$coef` the coefficients for the model for g when `glm` used, `g.AV$type` estimation procedure
`g_Delta`	missingness mechanism estimate. A list with three items: `g_Delta$g1W` an nx2 matrix containing values of P(Delta=1\|A,V,W,T) for each observation, `g_Delta$coef` the coefficients for the model for g when `glm` used, `g_Delta$type` estimation procedure

Author(s)

Susan Gruber sgruber@cal.berkeley.edu, in collaboration with Mark van der Laan.

References

1. Gruber, S. and van der Laan, M.J. (2012), tmle: An R Package for Targeted Maximum Likelihood Estimation. Journal of Statistical Software, 51(13), 1-35. http://www.jstatsoft.org/v51/i13/

2. Rosenblum, M. and van der Laan, M.J. (2010), Targeted Maximum Likelihood Estimation of the Parameter of a Marginal Structural Model. The International Journal of Biostatistics,6(2), 2010.

Examples

library(tmle)
# Example 1. Estimating MSM parameter with correctly specified regression formulas
# MSM: psi0 + psi1*A + psi2*V + psi3*A*V  (saturated)
# true parameter value: psi = (0, 1, -2, 0.5) 
# generate data
  set.seed(100)
  n <- 1000
  W <- matrix(rnorm(n*3), ncol = 3)
  colnames(W) <- c("W1", "W2", "W3")
  V <- rbinom(n, 1, 0.5)
  A <- rbinom(n, 1, 0.5)
  Y <- rbinom(n, 1, plogis(A - 2*V + 0.5*A*V))
  result.ex1 <- tmleMSM(Y, A, W, V, MSM = "A*V", Qform = Y~., gform = A~1, 
                        hAVform = A~1, family = "binomial")
  print(result.ex1)
## Not run: 
# Example 2. Repeated measures data, two observations per id
# (e.g., crossover study design)
# MSM: psi0 + psi1*A + psi2*V + psi3*V^2 + psi4*T
# true parameter value: psi = (-2, 1, 0, -2, 0 )
# generate data in wide format (id,  W1, Y(t),  W2(t), V(t), A(t)) 
   set.seed(10)
   n <- 250
   id <- rep(1:n)
   W1   <- rbinom(n, 1, 0.5)
   W2.1 <- rnorm(n)
   W2.2 <- rnorm(n)  
   V.1   <- rnorm(n)  
   V.2   <- rnorm(n)
   A.1 <- rbinom(n, 1, plogis(0.5 + 0.3 * W2.1))
   A.2 <- 1-A.1
   Y.1  <- -2 + A.1 - 2*V.1^2 + W2.1 + rnorm(n)
   Y.2  <- -2 + A.2 - 2*V.2^2 + W2.2 + rnorm(n)
   d <- data.frame(id, W1, W2=W2.1, W2.2, V=V.1, V.2, A=A.1, A.2, Y=Y.1, Y.2)

# change dataset from wide to long format
   longd <- reshape(d, 
          varying = cbind(c(3, 5, 7, 9), c(4, 6, 8, 10)),
          idvar = "id",
          direction = "long",
          timevar = "T",
          new.row.names = NULL,
          sep = "")
# misspecified model for initial Q, partial misspecification for g. 
# V_SL set to 2 to save time, not recommended at this sample size
   result.ex2 <- tmleMSM(Y = longd$Y, A = longd$A, W = longd[,c("W1", "W2")], V = longd$V, 
          T = longd$T, MSM = "A + V + I(V^2) + T", Qform = Y ~ A + V, gform = A ~ W2, 
	id = longd$id, V_SL=2)
   print(result.ex2)


# Example 3:  Introduce 20
# V_SL set to 2 to save time, not recommended at this sample size
  Delta <- rbinom(nrow(longd), 1, 0.8)
  result.ex3 <- tmleMSM(Y = longd$Y, A = longd$A, W = longd[,c("W1", "W2")], V = longd$V, T=longd$T,
          Delta = Delta, MSM = "A + V + I(V^2) + T", Qform = Y ~ A + V, gform = A ~ W2,
	  g.Deltaform = Delta~ 1, id=longd$id, verbose = TRUE, V_SL=2)
  print(result.ex3)

## End(Not run)

tmle

Targeted Maximum Likelihood Estimation

v1.5.0-1

BSD_3_clause + file LICENSE | GPL-2

Authors

Susan Gruber [aut, cre], Mark van der Laan [aut], Chris Kennedy [ctr]

Initial release

2020-05-20