General Template for Collaborative Targeted Maximum Likelihood Estimation
This function computes the Collaborative Targeted Maximum Likelihood Estimator.
ctmleGeneral(Y, A, W, Wg = W, Q, ctmletype, gn_candidates, gn_candidates_cv = NULL, alpha = 0.995, family = "gaussian", gbound = 0.025, like_type = "RSS", fluctuation = "logistic", verbose = FALSE, detailed = FALSE, PEN = FALSE, g1W = NULL, g1WPrev = NULL, V = 5, folds = NULL, stopFactor = 10^6)
Y |
continuous or binary outcome variable |
A |
binary treatment indicator, 1 for treatment, 0 for control |
W |
vector, matrix, or dataframe containing baseline covariates for Q bar |
Wg |
vector, matrix, or dataframe containing baseline covariates for propensity score model (defaults to W if not supplied by user) |
Q |
n by 2 matrix of initial values for Q0W, Q1W in columns 1 and 2, respectively. Current version does not support SL for automatic initial estimation of Q bar |
ctmletype |
1 or 3. Type of general C-TMLE. Type 1 uses cross-validation to select best gn, while Type 3 directly solves extra clever covariates. |
gn_candidates |
matrix or dataframe, each column stand for a estimate of propensity score. Estimate in the column with larger index should have smaller empirical loss |
gn_candidates_cv |
matrix or dataframe, each column stand for a the cross-validated estimate. For example, the (i,j)-th element is the predicted propensity score by j-th estimator, for i-th observation, when it is in the validation set |
alpha |
used to keep predicted initial values bounded away from (0,1) for logistic fluctuation, 0.995 (default) |
family |
family specification for working regression models, generally 'gaussian' for continuous outcomes (default), 'binomial' for binary outcomes |
gbound |
bound on P(A=1|W), defaults to 0.025 |
like_type |
'RSS' or 'loglike'. The metric to use for forward selection and cross-validation |
fluctuation |
'logistic' (default) or 'linear', for targeting step |
verbose |
print status messages if TRUE |
detailed |
boolean number. If it is TRUE, return more detailed results |
PEN |
boolean. If true, penalized loss is used in cross-validation step |
g1W |
Only used when type is 3. a user-supplied propensity score estimate. |
g1WPrev |
Only used when type is 3. a user-supplied propensity score estimate, with small fluctuation compared to g1W. |
V |
Number of folds. Only used if folds is not specified |
folds |
The list of indices for cross-validation step. We recommend the cv-splits in C-TMLE matchs that in gn_candidate_cv |
stopFactor |
Numerical value with default 1e6. If the current empirical likelihood is stopFactor times larger than the best previous one, the construction would stop |
best_k the index of estimate that selected by cross-validation
est estimate of psi_0
CI IC-based 95
pvalue pvalue for the null hypothesis that Psi = 0
likelihood sum of squared residuals, based on selected estimator evaluated on all obs or, logistic loglikelihood if like_type != "RSS"
varIC empirical variance of the influence curve adjusted for estimation of g
varDstar empirical variance of the influence curve
var.psi variance of the estimate
varIC.cv cross-validated variance of the influence curve
penlikelihood.cv penalized cross-validatedlikelihood
cv.res all cross-validation results for each fold
N <- 1000 p = 100 V = 5 Wmat <- matrix(rnorm(N * p), ncol = p) gcoef <- matrix(c(-1,-1,rep(-(3/((p)-2)),(p)-2)),ncol=1) W <- as.data.frame(Wmat) g <- 1/(1+exp(Wmat%*%gcoef / 3)) A <- rbinom(N, 1, prob = g) # Build potential outcome pairs, and the observed outcome Y beta1 <- 4+2*Wmat[,1]+2*Wmat[,2]+2*Wmat[,5]+2*Wmat[,6]+2*Wmat[,8] beta0 <- 2+2*Wmat[,1]+2*Wmat[,2]+2*Wmat[,5]+2*Wmat[,6]+2*Wmat[,8] tau = 2 sigma <- 1 epsilon <-rnorm(N,0,sigma) Y <- beta0 + tau * A + epsilon # Initial estimate of Q Q <- cbind(rep(mean(Y[A == 1]), N), rep(mean(Y[A == 0]), N)) folds <-by(sample(1:N,N), rep(1:V, length=N), list) lasso_fit <- cv.glmnet(x = as.matrix(W), y = A, alpha = 1, nlambda = 100, nfolds = 10) lasso_lambdas <- lasso_fit$lambda[lasso_fit$lambda <= lasso_fit$lambda.min][1:5] # Build template for glmnet SL.glmnet_new <- function (Y, X, newX, family, obsWeights, id, alpha = 1, nlambda = 100, lambda = 0,...) { # browser() if (!is.matrix(X)) { X <- model.matrix(~-1 + ., X) newX <- model.matrix(~-1 + ., newX) } fit <- glmnet::glmnet(x = X, y = Y, lambda = lambda, family = family$family, alpha = alpha) pred <- predict(fit, newx = newX, type = "response") fit <- list(object = fit) class(fit) <- "SL.glmnet" out <- list(pred = pred, fit = fit) return(out) } # Use a sequence of LASSO estimators to build gn sequence: SL.cv1lasso <- function (... , alpha = 1, lambda = lasso_lambdas[1]){ SL.glmnet_new(... , alpha = alpha, lambda = lambda) } SL.cv2lasso <- function (... , alpha = 1, lambda = lasso_lambdas[2]){ SL.glmnet_new(... , alpha = alpha, lambda = lambda) } SL.cv3lasso <- function (... , alpha = 1, lambda = lasso_lambdas[3]){ SL.glmnet_new(... , alpha = alpha, lambda = lambda) } SL.cv4lasso <- function (... , alpha = 1, lambda = lasso_lambdas[4]){ SL.glmnet_new(... , alpha = alpha, lambda = lambda) } SL.library = c('SL.cv1lasso', 'SL.cv2lasso', 'SL.cv3lasso', 'SL.cv4lasso', 'SL.glm') # Build the sequence. See more details in the function build_gn_seq, and package SuperLearner gn_seq <- build_gn_seq(A = A, W = W, SL.library = SL.library, folds = folds) # Use the output of build_gn_seq for ctmle general templates ctmle_fit <- ctmleGeneral(Y = Y, A = A, W = W, Q = Q, ctmletype = 1, gn_candidates = gn_seq$gn_candidates, gn_candidates_cv = gn_seq$gn_candidates_cv, folds = gn_seq$folds, V = length(folds))
Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.