Extrating predicting risks from regression models
Extract event probabilities from fitted regression models and machine learning objects.
The function predictRisk is a generic function, meaning that it invokes
specifically designed functions depending on the 'class' of the first
argument. See predictRisk
.
predictRisk(object, newdata, ...) ## Default S3 method: predictRisk(object, newdata, times, cause, ...) ## S3 method for class 'double' predictRisk(object, newdata, times, cause, ...) ## S3 method for class 'integer' predictRisk(object, newdata, times, cause, ...) ## S3 method for class 'factor' predictRisk(object, newdata, times, cause, ...) ## S3 method for class 'numeric' predictRisk(object, newdata, times, cause, ...) ## S3 method for class 'glm' predictRisk(object, newdata, iid = FALSE, average.iid = FALSE, ...) ## S3 method for class 'formula' predictRisk(object, newdata, ...) ## S3 method for class 'BinaryTree' predictRisk(object, newdata, ...) ## S3 method for class 'lrm' predictRisk(object, newdata, ...) ## S3 method for class 'rpart' predictRisk(object, newdata, ...) ## S3 method for class 'randomForest' predictRisk(object, newdata, ...) ## S3 method for class 'matrix' predictRisk(object, newdata, times, cause, ...) ## S3 method for class 'aalen' predictRisk(object, newdata, times, ...) ## S3 method for class 'cox.aalen' predictRisk(object, newdata, times, ...) ## S3 method for class 'coxph' predictRisk( object, newdata, times, product.limit = FALSE, diag = FALSE, iid = FALSE, average.iid = FALSE, ... ) ## S3 method for class 'coxphTD' predictRisk(object, newdata, times, landmark, ...) ## S3 method for class 'CSCTD' predictRisk(object, newdata, times, cause, landmark, ...) ## S3 method for class 'coxph.penal' predictRisk(object, newdata, times, ...) ## S3 method for class 'cph' predictRisk( object, newdata, times, product.limit = FALSE, diag = FALSE, iid = FALSE, average.iid = FALSE, ... ) ## S3 method for class 'selectCox' predictRisk(object, newdata, times, ...) ## S3 method for class 'prodlim' predictRisk(object, newdata, times, cause, ...) ## S3 method for class 'survfit' predictRisk(object, newdata, times, ...) ## S3 method for class 'psm' predictRisk(object, newdata, times, ...) ## S3 method for class 'ranger' predictRisk(object, newdata, times, cause, ...) ## S3 method for class 'rfsrc' predictRisk(object, newdata, times, cause, ...) ## S3 method for class 'FGR' predictRisk(object, newdata, times, cause, ...) ## S3 method for class 'riskRegression' predictRisk(object, newdata, times, cause, ...) ## S3 method for class 'ARR' predictRisk(object, newdata, times, cause, ...) ## S3 method for class 'CauseSpecificCox' predictRisk( object, newdata, times, cause, product.limit = TRUE, diag = FALSE, iid = FALSE, average.iid = FALSE, ... ) ## S3 method for class 'penfitS3' predictRisk(object, newdata, times, ...) ## S3 method for class 'SuperPredictor' predictRisk(object, newdata, ...) ## S3 method for class 'gbm' predictRisk(object, newdata, times, ...) ## S3 method for class 'flexsurvreg' predictRisk(object, newdata, times, ...) ## S3 method for class 'singleEventCB' predictRisk(object, newdata, times, cause, ...) ## S3 method for class 'wglm' predictRisk( object, newdata, times = NULL, product.limit = FALSE, diag = FALSE, iid = FALSE, average.iid = FALSE, ... )
object |
A fitted model from which to extract predicted event probabilities. |
newdata |
A data frame containing predictor variable combinations for which to compute predicted event probabilities. |
... |
Additional arguments that are passed on to the current method. |
times |
A vector of times in the range of the response variable, for which the cumulative incidences event probabilities are computed. |
cause |
Identifies the cause of interest among the competing events. |
iid |
Should the iid decomposition be output using an attribute? |
average.iid |
Should the average iid decomposition be output using an attribute? |
product.limit |
If |
diag |
when |
landmark |
The starting time for the computation of the cumulative risk. |
In uncensored binary outcome data there is no need to choose a time point.
When operating on models for survival analysis (without competing risks) the function still predicts the risk, as 1 - S(t|X) where S(t|X) is survival chance of a subject characterized by X.
When there are competing risks (and the data are right censored) one needs to specify both the time horizon for prediction (can be a vector) and the cause of the event. The function then extracts the absolute risks F_c(t|X) aka the cumulative incidence of an event of type/cause c until time t for a subject characterized by X. Depending on the model it may or not be possible to predict the risk of all causes in a competing risks setting. For example. a cause-specific Cox (CSC) object allows to predict both cases whereas a Fine-Gray regression model (FGR) is specific to one of the causes.
For binary outcome a vector with predicted risks. For survival outcome with and without
competing risks
a matrix with as many rows as NROW(newdata)
and as many
columns as length(times)
. Each entry is a probability and in
rows the values should be increasing.
Thomas A. Gerds tag@biostat.ku.dk
## binary outcome library(rms) set.seed(7) d <- sampleData(80,outcome="binary") nd <- sampleData(80,outcome="binary") fit <- lrm(Y~X1+X8,data=d) predictRisk(fit,newdata=nd) ## Not run: library(SuperLearner) set.seed(1) sl = SuperLearner(Y = d$Y, X = d[,-1], family = binomial(), SL.library = c("SL.mean", "SL.glmnet", "SL.randomForest")) ## End(Not run) ## survival outcome # generate survival data library(prodlim) set.seed(100) d <- sampleData(100,outcome="survival") d[,X1:=as.numeric(as.character(X1))] d[,X2:=as.numeric(as.character(X2))] # then fit a Cox model library(rms) cphmodel <- cph(Surv(time,event)~X1+X2,data=d,surv=TRUE,x=TRUE,y=TRUE) # or via survival library(survival) coxphmodel <- coxph(Surv(time,event)~X1+X2,data=d,x=TRUE,y=TRUE) # Extract predicted survival probabilities # at selected time-points: ttt <- quantile(d$time) # for selected predictor values: ndat <- data.frame(X1=c(0.25,0.25,-0.05,0.05),X2=c(0,1,0,1)) # as follows predictRisk(cphmodel,newdata=ndat,times=ttt) predictRisk(coxphmodel,newdata=ndat,times=ttt) # stratified cox model sfit <- coxph(Surv(time,event)~strata(X1)+X2,data=d,x=TRUE,y=TRUE) predictRisk(sfit,newdata=d[1:3,],times=c(1,3,5,10)) ## simulate learning and validation data learndat <- sampleData(100,outcome="survival") valdat <- sampleData(100,outcome="survival") ## use the learning data to fit a Cox model library(survival) fitCox <- coxph(Surv(time,event)~X1+X2,data=learndat,x=TRUE,y=TRUE) ## suppose we want to predict the survival probabilities for all subjects ## in the validation data at the following time points: ## 0, 12, 24, 36, 48, 60 psurv <- predictRisk(fitCox,newdata=valdat,times=seq(0,60,12)) ## This is a matrix with event probabilities (1-survival) ## one column for each of the 5 time points ## one row for each validation set individual # Do the same for a randomSurvivalForest model # library(randomForestSRC) # rsfmodel <- rfsrc(Surv(time,event)~X1+X2,data=learndat) # prsfsurv=predictRisk(rsfmodel,newdata=valdat,times=seq(0,60,12)) # plot(psurv,prsfsurv) ## Cox with ridge option f1 <- coxph(Surv(time,event)~X1+X2,data=learndat,x=TRUE,y=TRUE) f2 <- coxph(Surv(time,event)~ridge(X1)+ridge(X2),data=learndat,x=TRUE,y=TRUE) ## Not run: plot(predictRisk(f1,newdata=valdat,times=10), riskRegression:::predictRisk.coxph(f2,newdata=valdat,times=10), xlim=c(0,1), ylim=c(0,1), xlab="Unpenalized predicted survival chance at 10", ylab="Ridge predicted survival chance at 10") ## End(Not run) ## competing risks library(survival) library(riskRegression) library(prodlim) train <- prodlim::SimCompRisk(100) test <- prodlim::SimCompRisk(10) cox.fit <- CSC(Hist(time,cause)~X1+X2,data=train) predictRisk(cox.fit,newdata=test,times=seq(1:10),cause=1) ## with strata cox.fit2 <- CSC(list(Hist(time,cause)~strata(X1)+X2,Hist(time,cause)~X1+X2),data=train) predictRisk(cox.fit2,newdata=test,times=seq(1:10),cause=1)
Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.