Calculate P-values Based on Multi-Splitting Approach
Calculate p-values and confidence intervals based on the multi-splitting approach
multi.split(x, y, B = 100, fraction = 0.5, ci = TRUE, ci.level = 0.95, model.selector = lasso.cv, classical.fit = lm.pval, classical.ci = lm.ci, parallel = FALSE, ncores = getOption("mc.cores", 2L), gamma = seq(ceiling(0.05 * B) / B, 1 - 1 / B, by = 1 / B), args.model.selector = NULL, args.classical.fit = NULL, args.classical.ci = NULL, return.nonaggr = FALSE, return.selmodels = FALSE, repeat.max = 20, verbose = FALSE)
x |
numeric design matrix (without intercept). |
y |
numeric response vector. |
B |
the number of sample-splits, a positive integer. |
fraction |
a number in (0,1), the fraction of data used at each sample split for the model selection process. The remaining data is used for calculating the p-values. |
ci |
logical indicating if a confidence interval should be calculated for each parameter. |
ci.level |
(if |
model.selector |
a |
classical.fit |
a |
classical.ci |
a |
parallel |
logical indicating if parallelization via
|
ncores |
number of cores used for parallelization as
|
gamma |
vector of gamma-values. In case gamma is a scalar, the value Q_j instead of P_j is being calculated (see reference below). |
args.model.selector |
named |
args.classical.fit |
named |
args.classical.ci |
named |
return.nonaggr |
|
return.selmodels |
|
repeat.max |
positive integer indicating the maximal number of split trials. Should not matter in regular cases, but necessary to prevent infinite loops in borderline cases. |
verbose |
should information be printed out while computing? (logical). |
pval.corr |
Vector of multiple testing corrected p-values. |
gamma.min |
Value of gamma where minimal p-values was attained. |
clusterGroupTest |
Function to perform groupwise tests based on
hierarchical clustering. You can either provide a distance matrix
and clustering method or the output of hierarchical clustering from
the function |
Lukas Meier, Ruben Dezeure, Jacopo Mandozzi
Meinshausen, N., Meier, L. and Bühlmann, P. (2009) P-values for high-dimensional regression. Journal of the American Statistical Association 104, 1671–1681.
Mandozzi, J. and Bühlmann, P. (2015) A sequential rejection testing method for high-dimensional regression with correlated variables. To appear in the International Journal of Biostatistics. Preprint arXiv:1502.03300
n <- 40 # a bit small, to keep example "fast" p <- 256 x <- matrix(rnorm(n*p), nrow = n, ncol = p) y <- x[,1] * 2 + x[,2] * 2.5 + rnorm(n) ## Multi-splitting with lasso.firstq as model selector function ## 'q' must be specified fit.multi <- multi.split(x, y, model.selector = lasso.firstq, args.model.selector = list(q = 10)) fit.multi head(fit.multi$pval.corr, 10) ## the first 10 p-values ci. <- confint(fit.multi) head(ci.) # the first 6 stopifnot(all.equal(ci., with(fit.multi, cbind(lci, uci)), check.attributes=FALSE)) ## Use default 'lasso.cv' (slower!!) -- incl cluster group testing: system.time(fit.m2 <- multi.split(x, y, return.selmodels = TRUE))# 9 sec (on "i7") head(fit.m2$pval.corr) ## the first 6 p-values head(confint(fit.m2)) ## the first 6 95% conf.intervals ## Now do clustergroup testing clGTst <- fit.m2$clusterGroupTest names(envGT <- environment(clGTst))# about 14 if(!interactive()) # if you are curious (and advanced): print(ls.str(envGT), max = 0) stopifnot(identical(clGTst, envGT$clusterGroupTest)) ccc <- clGTst() str(ccc) ccc$hh # the clustering has.1.or.2 <- sapply(ccc$clusters, function(j.set) any(c(1,2) %in% j.set)) ccc$pval[ has.1.or.2] ## all very small ccc$pval[!has.1.or.2] ## all 1
Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.