Extrating predicting risks from regression models
Extract event probabilities from fitted regression models and machine learning objects.
The function predictRisk is a generic function, meaning that it invokes
specifically designed functions depending on the 'class' of the first
argument. See predictRisk.
predictRisk(object, newdata, ...) ## Default S3 method: predictRisk(object, newdata, times, cause, ...) ## S3 method for class 'double' predictRisk(object, newdata, times, cause, ...) ## S3 method for class 'integer' predictRisk(object, newdata, times, cause, ...) ## S3 method for class 'factor' predictRisk(object, newdata, times, cause, ...) ## S3 method for class 'numeric' predictRisk(object, newdata, times, cause, ...) ## S3 method for class 'glm' predictRisk(object, newdata, iid = FALSE, average.iid = FALSE, ...) ## S3 method for class 'formula' predictRisk(object, newdata, ...) ## S3 method for class 'BinaryTree' predictRisk(object, newdata, ...) ## S3 method for class 'lrm' predictRisk(object, newdata, ...) ## S3 method for class 'rpart' predictRisk(object, newdata, ...) ## S3 method for class 'randomForest' predictRisk(object, newdata, ...) ## S3 method for class 'matrix' predictRisk(object, newdata, times, cause, ...) ## S3 method for class 'aalen' predictRisk(object, newdata, times, ...) ## S3 method for class 'cox.aalen' predictRisk(object, newdata, times, ...) ## S3 method for class 'coxph' predictRisk( object, newdata, times, product.limit = FALSE, diag = FALSE, iid = FALSE, average.iid = FALSE, ... ) ## S3 method for class 'coxphTD' predictRisk(object, newdata, times, landmark, ...) ## S3 method for class 'CSCTD' predictRisk(object, newdata, times, cause, landmark, ...) ## S3 method for class 'coxph.penal' predictRisk(object, newdata, times, ...) ## S3 method for class 'cph' predictRisk( object, newdata, times, product.limit = FALSE, diag = FALSE, iid = FALSE, average.iid = FALSE, ... ) ## S3 method for class 'selectCox' predictRisk(object, newdata, times, ...) ## S3 method for class 'prodlim' predictRisk(object, newdata, times, cause, ...) ## S3 method for class 'survfit' predictRisk(object, newdata, times, ...) ## S3 method for class 'psm' predictRisk(object, newdata, times, ...) ## S3 method for class 'ranger' predictRisk(object, newdata, times, cause, ...) ## S3 method for class 'rfsrc' predictRisk(object, newdata, times, cause, ...) ## S3 method for class 'FGR' predictRisk(object, newdata, times, cause, ...) ## S3 method for class 'riskRegression' predictRisk(object, newdata, times, cause, ...) ## S3 method for class 'ARR' predictRisk(object, newdata, times, cause, ...) ## S3 method for class 'CauseSpecificCox' predictRisk( object, newdata, times, cause, product.limit = TRUE, diag = FALSE, iid = FALSE, average.iid = FALSE, ... ) ## S3 method for class 'penfitS3' predictRisk(object, newdata, times, ...) ## S3 method for class 'SuperPredictor' predictRisk(object, newdata, ...) ## S3 method for class 'gbm' predictRisk(object, newdata, times, ...) ## S3 method for class 'flexsurvreg' predictRisk(object, newdata, times, ...) ## S3 method for class 'singleEventCB' predictRisk(object, newdata, times, cause, ...) ## S3 method for class 'wglm' predictRisk( object, newdata, times = NULL, product.limit = FALSE, diag = FALSE, iid = FALSE, average.iid = FALSE, ... )
object |
A fitted model from which to extract predicted event probabilities. |
newdata |
A data frame containing predictor variable combinations for which to compute predicted event probabilities. |
... |
Additional arguments that are passed on to the current method. |
times |
A vector of times in the range of the response variable, for which the cumulative incidences event probabilities are computed. |
cause |
Identifies the cause of interest among the competing events. |
iid |
Should the iid decomposition be output using an attribute? |
average.iid |
Should the average iid decomposition be output using an attribute? |
product.limit |
If |
diag |
when |
landmark |
The starting time for the computation of the cumulative risk. |
In uncensored binary outcome data there is no need to choose a time point.
When operating on models for survival analysis (without competing risks) the function still predicts the risk, as 1 - S(t|X) where S(t|X) is survival chance of a subject characterized by X.
When there are competing risks (and the data are right censored) one needs to specify both the time horizon for prediction (can be a vector) and the cause of the event. The function then extracts the absolute risks F_c(t|X) aka the cumulative incidence of an event of type/cause c until time t for a subject characterized by X. Depending on the model it may or not be possible to predict the risk of all causes in a competing risks setting. For example. a cause-specific Cox (CSC) object allows to predict both cases whereas a Fine-Gray regression model (FGR) is specific to one of the causes.
For binary outcome a vector with predicted risks. For survival outcome with and without
competing risks
a matrix with as many rows as NROW(newdata) and as many
columns as length(times). Each entry is a probability and in
rows the values should be increasing.
Thomas A. Gerds tag@biostat.ku.dk
## binary outcome
library(rms)
set.seed(7)
d <- sampleData(80,outcome="binary")
nd <- sampleData(80,outcome="binary")
fit <- lrm(Y~X1+X8,data=d)
predictRisk(fit,newdata=nd)
## Not run:
library(SuperLearner)
set.seed(1)
sl = SuperLearner(Y = d$Y, X = d[,-1], family = binomial(),
SL.library = c("SL.mean", "SL.glmnet", "SL.randomForest"))
## End(Not run)
## survival outcome
# generate survival data
library(prodlim)
set.seed(100)
d <- sampleData(100,outcome="survival")
d[,X1:=as.numeric(as.character(X1))]
d[,X2:=as.numeric(as.character(X2))]
# then fit a Cox model
library(rms)
cphmodel <- cph(Surv(time,event)~X1+X2,data=d,surv=TRUE,x=TRUE,y=TRUE)
# or via survival
library(survival)
coxphmodel <- coxph(Surv(time,event)~X1+X2,data=d,x=TRUE,y=TRUE)
# Extract predicted survival probabilities
# at selected time-points:
ttt <- quantile(d$time)
# for selected predictor values:
ndat <- data.frame(X1=c(0.25,0.25,-0.05,0.05),X2=c(0,1,0,1))
# as follows
predictRisk(cphmodel,newdata=ndat,times=ttt)
predictRisk(coxphmodel,newdata=ndat,times=ttt)
# stratified cox model
sfit <- coxph(Surv(time,event)~strata(X1)+X2,data=d,x=TRUE,y=TRUE)
predictRisk(sfit,newdata=d[1:3,],times=c(1,3,5,10))
## simulate learning and validation data
learndat <- sampleData(100,outcome="survival")
valdat <- sampleData(100,outcome="survival")
## use the learning data to fit a Cox model
library(survival)
fitCox <- coxph(Surv(time,event)~X1+X2,data=learndat,x=TRUE,y=TRUE)
## suppose we want to predict the survival probabilities for all subjects
## in the validation data at the following time points:
## 0, 12, 24, 36, 48, 60
psurv <- predictRisk(fitCox,newdata=valdat,times=seq(0,60,12))
## This is a matrix with event probabilities (1-survival)
## one column for each of the 5 time points
## one row for each validation set individual
# Do the same for a randomSurvivalForest model
# library(randomForestSRC)
# rsfmodel <- rfsrc(Surv(time,event)~X1+X2,data=learndat)
# prsfsurv=predictRisk(rsfmodel,newdata=valdat,times=seq(0,60,12))
# plot(psurv,prsfsurv)
## Cox with ridge option
f1 <- coxph(Surv(time,event)~X1+X2,data=learndat,x=TRUE,y=TRUE)
f2 <- coxph(Surv(time,event)~ridge(X1)+ridge(X2),data=learndat,x=TRUE,y=TRUE)
## Not run:
plot(predictRisk(f1,newdata=valdat,times=10),
riskRegression:::predictRisk.coxph(f2,newdata=valdat,times=10),
xlim=c(0,1),
ylim=c(0,1),
xlab="Unpenalized predicted survival chance at 10",
ylab="Ridge predicted survival chance at 10")
## End(Not run)
## competing risks
library(survival)
library(riskRegression)
library(prodlim)
train <- prodlim::SimCompRisk(100)
test <- prodlim::SimCompRisk(10)
cox.fit <- CSC(Hist(time,cause)~X1+X2,data=train)
predictRisk(cox.fit,newdata=test,times=seq(1:10),cause=1)
## with strata
cox.fit2 <- CSC(list(Hist(time,cause)~strata(X1)+X2,Hist(time,cause)~X1+X2),data=train)
predictRisk(cox.fit2,newdata=test,times=seq(1:10),cause=1)Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.