discSurv: dataLongSubDist – R documentation

Become an expert in R — Interactive courses, Cheat Sheets, certificates and more!

dataLongSubDist

Data Matrix and Weights for Discrete Subdistribution Hazard Models

Description

Generates the augmented data matrix and the weights required for discrete subdistribution hazard modeling with right censoring.

Usage

dataLongSubDist(dataSet, timeColumn, eventColumns, eventFocus,
timeAsFactor=TRUE)

Arguments

`dataSet`	Original data in short format. Must be of class "data.frame".
`timeColumn`	Character specifying the column name of the observed event times. It is required that the observed times are discrete (integer).
`eventColumns`	Character vector specifying the column names of the event indicators (excluding censoring events). It is required that a 0-1 coding is used for all events. The algorithm treats row sums of zero of all event columns as censored.
`eventFocus`	Column name of the event of interest (type 1 event).
`timeAsFactor`	Logical indicating whether time should be coded as a factor in the augmented data matrix. If FALSE, a numeric coding will be used.

Details

This function sets up the augmented data matrix and the weights that are needed for weighted maximum likelihood (ML) estimation of the discrete subdistribution model proposed by Berger et al. (2018). The model is a discrete-time extension of the original subdistribution model proposed by Fine and Gray (1999).

Value

Data frame with additional column "subDistWeights". The latter column contains the weights that are needed for fitting a weighted binary regression model, as described in Berger et al. (2018). The weights are calculated by a life table estimator for the censoring event.

Author(s)

Thomas Welchowski welchow@imbie.meb.uni-bonn.de

References

Moritz Berger, Matthias Schmid, Thomas Welchowski, Steffen Schmitz-Valckenberg and Jan Beyersmann, (2018), Subdistribution Hazard Models for Competing Risks in Discrete Time, Biostatistics, Doi: 10.1093/biostatistics/kxy069

Jason P. Fine and Robert J. Gray, (1999), A proportional hazards model for the subdistribution of a competing risk, Journal of the American Statistical Association 94, pages 496-509.

Examples

# Example with unemployment data
library(Ecdat)
data(UnempDur)

# Generate subsample, reduce number of intervals to k = 5
SubUnempDur <- UnempDur [1:500, ]
SubUnempDur$time <- as.numeric(cut(SubUnempDur$spell, c(0,4,8,16,28)))

# Convert competing risks data to long format
# The event of interest is re-employment at full job
SubUnempDurLong <- dataLongSubDist (dataSet=SubUnempDur, timeColumn="time", 
eventColumns=c("censor1", "censor2", "censor3"), eventFocus="censor1")
head(SubUnempDurLong)

# Fit discrete subdistribution hazard model with logistic link function
logisticSubDistr <- glm(y ~ timeInt + ui + age + logwage,
                    family=binomial(), data = SubUnempDurLong, 
                    weights=SubUnempDurLong$subDistWeights)
summary(logisticSubDistr)

discSurv

Discrete Time Survival Analysis

v1.4.1

GPL-3

Authors

Thomas Welchowski <welchow@imbie.meb.uni-bonn.de> and Matthias Schmid <matthias.schmid@imbie.uni-bonn.de>

Initial release

2019-12-10