apricom: splitval – R documentation

Pricing

Become an expert in R — Interactive courses, Cheat Sheets, certificates and more!

Get Started for Free

Documentation

apricom

splitval

Split-sample-derived Shrinkage After Estimation

Description

Shrink regression coefficients using a split-sample-derived shrinkage factor.

Usage

splitval(dataset, model, nrounds, fract, sdm, int = TRUE, int.adj)

Arguments

`dataset`	a dataset for regression analysis. Data should be in the form of a matrix, with the outcome variable as the final column. Application of the `datashape` function beforehand is recommended, especially if categorical predictors are present. For regression with an intercept included a column vector of 1s should be included before the dataset (see examples)
`model`	type of regression model. Either "linear" or "logistic".
`nrounds`	the number of times to replicate the sample splitting process.
`fract`	the fraction of observations designated to the training set
`sdm`	a shrinkage design matrix. For examples, see `ols.shrink`
`int`	logical. If TRUE the model will include a regression intercept.
`int.adj`	logical. If TRUE the regression intercept will be re-estimated after shrinkage of the regression coefficients.

Details

This function applies sample-splitting to a dataset in order to derive a shrinkage factor and apply it to the regression coefficients. Data are randomly split into two sets, a training set and a test set. Regression coefficients are estimated using the training sample, and then a shrinkage factor is estimated using the test set. The mean of N shrinkage factors is then applied to the original regression coeffients, and the regression intercept may be re-estimated.

This process can currently be applied to linear or logistic regression models.

Value

splitval returns a list containing the following:

`raw.coeff`	the raw regression model coefficients, pre-shrinkage.
`shrunk.coeff`	the shrunken regression model coefficients
`lambda`	the mean shrinkage factor over Nrounds split-sample replicates
`Nrounds`	the number of rounds of sample splitting
`sdm`	the shrinkage design matrix used to apply the shrinkage factor(s) to the regression coefficients

Examples

## Example 1: Linear regression using the iris dataset
## Split-sample-derived shrinkage with 100 rounds of sample-splitting
data(iris)
iris.data <- as.matrix(iris[, 1:4])
iris.data <- cbind(1, iris.data)
sdm1 <- matrix(c(0, 1, 1, 1), nrow = 1)
set.seed(321)
splitval(dataset = iris.data, model = "linear", nrounds = 100,
fract = 0.75, sdm = sdm1, int = TRUE, int.adj = TRUE)

## Example 2: logistic regression using a subset of the mtcars data
## Split-sample-derived shrinkage
data(mtcars)
mtc.data <- cbind(1,datashape(mtcars, y = 8, x = c(1, 6, 9)))
head(mtc.data)
set.seed(123)
splitval(dataset = mtc.data, model = "logistic",
nrounds = 100, fract = 0.5)

apricom

Tools for the a Priori Comparison of Regression Modelling Strategies

v1.0.0

GPL-2

Authors

Romin Pajouheshnia [aut, cre], Wiebe Pestman [aut], Rolf Groenwold [aut]

Initial release

2015-11-11

splitval

Description

Usage

Arguments

Details

Value

Examples

apricom

We don't support your browser anymore