Tuning the number of PCs in the PCR with compositional data using the α-transformation
This is a cross-validation procedure to decide on the number of principal components when using regression with compositional data (as predictor variables) using the α-transformation.
alfapcr.tune(y, x, model = "gaussian", nfolds = 10, maxk = 50, a = seq(-1, 1, by = 0.1), folds = NULL, ncores = 1, graph = TRUE, col.nu = 15, seed = FALSE)
y |
A vector with either continuous, binary or count data. |
x |
A matrix with the predictor variables, the compositional data. Zero values are allowed. |
model |
The type of regression model to fit. The possible values are "gaussian", "binomial" and "poisson". |
nfolds |
The number of folds for the K-fold cross validation, set to 10 by default. |
maxk |
The maximum number of principal components to check. |
a |
The value of the power transformation, it has to be between -1 and 1. If zero values are present it has to be greater than 0. If α=0 the isometric log-ratio transformation is applied and the solution exists in a closed form, since it the classical mutivariate regression. The estimated bias correction via the (Tibshirani and Tibshirani (2009) criterion is applied. |
folds |
If you have the list with the folds supply it here. You can also leave it NULL and it will create folds. |
ncores |
How many cores to use. If you have heavy computations or do not want to wait for long time more than 1 core (if available) is suggested. It is advisable to use it if you have many observations and or many variables, otherwise it will slow down th process. |
graph |
If graph is TRUE (default value) a filled contour plot will appear. |
col.nu |
A number parameter for the filled contour plot, taken into account only if graph is TRUE. |
seed |
If seed is TRUE the results will always be the same. |
The α-transformation is applied to the compositional data first and the function "pcr.tune" or "glmpcr.tune" is called.
If graph is TRUE a filled contour will appear. A list including:
mspe |
The MSPE where rows correspond to the α values and the columns to the number of principal components. |
best.par |
The best pair of α and number of principal components. |
performance |
The minimum mean squared error of prediction. |
runtime |
The time required by the cross-validation procedure. |
Michail Tsagris
R implementation and documentation: Michail Tsagris mtsagris@uoc.gr and Giorgos Athineou <gioathineou@gmail.com>
Tsagris M. (2015). Regression analysis with compositional data containing zero values. Chilean Journal of Statistics, 6(2): 47-57. https://arxiv.org/pdf/1508.01913v1.pdf
Tsagris M.T., Preston S. and Wood A.T.A. (2011). A data-based power transformation for compositional data. In Proceedings of the 4th Compositional Data Analysis Workshop, Girona, Spain. https://arxiv.org/pdf/1106.1451.pdf
Jolliffe I.T. (2002). Principal Component Analysis.
library(MASS) y <- as.vector(fgl[, 1]) x <- as.matrix(fgl[, 2:9]) x <- x/ rowSums(x) mod <- alfapcr.tune(y, x, nfolds = 10, maxk = 50, a = seq(-1, 1, by = 0.1) )
Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.