Longitudinal Functional Data Analysis using FPCA
Implements longitudinal functional data analysis (Park and Staicu, 2015). It decomposes longitudinally-observed functional observations in two steps. It first applies FPCA on a properly defined marginal covariance function and obtain estimated scores (mFPCA step). Then it further models the underlying process dynamics by applying another FPCA on a covariance of the estimated scores obtained in the mFPCA step. The function also allows to use a random effects model to study the underlying process dynamics instead of a KL expansion model in the second step. Scores in mFPCA step are estimated using numerical integration. Scores in sFPCA step are estimated under a mixed model framework.
fpca.lfda( Y, subject.index, visit.index, obsT = NULL, funcArg = NULL, numTEvalPoints = 41, newdata = NULL, fbps.knots = c(5, 10), fbps.p = 3, fbps.m = 2, mFPCA.pve = 0.95, mFPCA.knots = 35, mFPCA.p = 3, mFPCA.m = 2, mFPCA.npc = NULL, LongiModel.method = c("fpca.sc", "lme"), sFPCA.pve = 0.95, sFPCA.nbasis = 10, sFPCA.npc = NULL, gam.method = "REML", gam.kT = 10 )
Y |
a matrix of which each row corresponds to one curve observed on a regular and dense grid (dimension of N by m; N = total number of observed functions; m = number of grid points) |
subject.index |
subject id; vector of length N with each element corresponding a row of Y |
visit.index |
index for visits (repeated measures); vector of length N with each element corresponding a row of Y |
obsT |
actual time of visits at which a function is observed; vector of length N with each element corresponding a row of Y |
funcArg |
numeric; function argument |
numTEvalPoints |
total number of evaluation time points for visits; used for pre-binning in sFPCA step; defaults to 41 |
newdata |
an optional data frame providing predictors (i for subject id / Ltime for visit time) with which prediction is desired; defaults to NULL |
fbps.knots |
list of two vectors of knots or number of equidistanct knots for all dimensions
for a fast bivariate P-spline smoothing (fbps) method used to estimate a bivariate, smooth mean function; defaults to c(5,10); see |
fbps.p |
integer;degrees of B-spline functions to use for a fbps method; defaults to 3; see |
fbps.m |
integer;order of differencing penalty to use for a fbps method; defaults to 2; see |
mFPCA.pve |
proportion of variance explained for a mFPCA step; used to choose the number of principal components (PCs); defaults to 0.95; see |
mFPCA.knots |
number of knots to use or the vectors of knots in a mFPCA step; used for obtain a smooth estimate of a covariance function; defaults to 35; see |
mFPCA.p |
integer; the degree of B-spline functions to use in a mFPCA step; defaults to 3; see |
mFPCA.m |
integer;order of differencing penalty to use in a mFPCA step; defaults to 2; see |
mFPCA.npc |
pre-specified value for the number of principal components; if given, it overrides |
LongiModel.method |
model and estimation method for estimating covariance of estimated scores from a mFPCA step; either KL expansion model or random effects model; defaults to fpca.sc |
sFPCA.pve |
proportion of variance explained for sFPCA step; used to choose the number of principal components; defaults to 0.95; see |
sFPCA.nbasis |
number of B-spline basis functions used in sFPCA step for estimation of the mean function and bivariate smoothing of the covariance surface; defaults to 10; see |
sFPCA.npc |
pre-specified value for the number of principal components; if given, it overrides |
gam.method |
smoothing parameter estimation method when |
gam.kT |
dimension of basis functions to use; see |
A list with components
obsData |
observed data (input) |
i |
subject id |
funcArg |
function argument |
visitTime |
visit times |
fitted.values |
fitted values (in-sample); of the same dimension as Y |
fitted.values.all |
a list of which each component consists of a subject's fitted values at all pairs of evaluation points (s and T) |
predicted.values |
predicted values for variables provided in newdata |
bivariateSmoothMeanFunc |
estimated bivariate smooth mean function |
mFPCA.efunctions |
estimated eigenfunction in a mFPCA step |
mFPCA.evalues |
estimated eigenvalues in a mFPCA step |
mFPCA.npc |
number of principal components selected with pre-specified pve in a mFPCA step |
mFPCA.scree.eval |
estimated eigenvalues obtained with pre-specified pve = 0.9999; for scree plot |
sFPCA.xiHat.bySubj |
a list of which each component consists of a subject's predicted score functions evaluated at equidistanced grid in direction of visit time, T |
sFPCA.npc |
a vector of numbers of principal components selected in a sFPCA step with pre-specified pve; length of mFPCA.npc |
mFPCA.covar |
estimated marginal covariance |
sFPCA.longDynCov.k |
a list of estimated covariance of score function; length of mFPCA.npc |
A random effects model is recommended when a set of visit times for all subjects and visits is not dense in its range.
So Young Park spark13@ncsu.edu, Ana-Maria Staicu
Park, S.Y. and Staicu, A.M. (2015). Longitudinal functional data analysis. Stat 4 212-226.
## Not run: ######################################## ### Illustration with real data ### ######################################## data(DTI) MS <- subset(DTI, case ==1) # subset data with multiple sclerosis (MS) case index.na <- which(is.na(MS$cca)) Y <- MS$cca; Y[index.na] <- fpca.sc(Y)$Yhat[index.na]; sum(is.na(Y)) id <- MS$ID visit.index <- MS$visit visit.time <- MS$visit.time/max(MS$visit.time) lfpca.dti <- fpca.lfda(Y = Y, subject.index = id, visit.index = visit.index, obsT = visit.time, LongiModel.method = 'lme', mFPCA.pve = 0.95) TT <- seq(0,1,length.out=41); ss = seq(0,1,length.out=93) # estimated mean function persp(x = ss, y = TT, z = t(lfpca.dti$bivariateSmoothMeanFunc), xlab="s", ylab="visit times", zlab="estimated mean fn", col='light blue') # first three estimated marginal eigenfunctions matplot(ss, lfpca.dti$mFPCA.efunctions[,1:3], type='l', xlab='s', ylab='estimated eigen fn') # predicted scores function corresponding to first two marginal PCs matplot(TT, do.call(cbind, lapply(lfpca.dti$sFPCA.xiHat.bySubj, function(a) a[,1])), xlab="visit time (T)", ylab="xi_hat(T)", main = "k = 1", type='l') matplot(TT, do.call(cbind, lapply(lfpca.dti$sFPCA.xiHat.bySubj, function(a) a[,2])), xlab="visit time (T)", ylab="xi_hat(T)", main = "k = 2", type='l') # prediction of cca of first two subjects at T = 0, 0.5 and 1 (black, red, green) matplot(ss, t(lfpca.dti$fitted.values.all[[1]][c(1,21,41),]), type='l', lty = 1, ylab="", xlab="s", main = "Subject = 1") matplot(ss, t(lfpca.dti$fitted.values.all[[2]][c(1,21,41),]), type='l', lty = 1, ylab="", xlab="s", main = "Subject = 2") ######################################## ### Illustration with simulated data ### ######################################## ########################################################################################### # data generation ########################################################################################### set.seed(1) n <- 100 # number of subjects ss <- seq(0,1,length.out=101) TT <- seq(0, 1, length.out=41) mi <- runif(n, min=6, max=15) ij <- sapply(mi, function(a) sort(sample(1:41, size=a, replace=FALSE))) # error variances sigma <- 0.1 sigma_wn <- 0.2 lambdaTrue <- c(1,0.5) # True eigenvalues eta1True <- c(0.5, 0.5^2, 0.5^3) # True eigenvalues eta2True <- c(0.5^2, 0.5^3) # True eigenvalues phi <- sqrt(2)*cbind(sin(2*pi*ss),cos(2*pi*ss)) psi1 <- cbind(rep(1,length(TT)), sqrt(3)*(2*TT-1), sqrt(5)*(6*TT^2-6*TT+1)) psi2 <- sqrt(2)*cbind(sin(2*pi*TT),cos(2*pi*TT)) zeta1 <- sapply(eta1True, function(a) rnorm(n = n, mean = 0, sd = a)) zeta2 <- sapply(eta2True, function(a) rnorm(n = n, mean = 0, sd = a)) xi1 <- unlist(lapply(1:n, function(a) (zeta1 %*% t(psi1))[a,ij[[a]]] )) xi2 <- unlist(lapply(1:n, function(a) (zeta2 %*% t(psi2))[a,ij[[a]]] )) xi <- cbind(xi1, xi2) Tij <- unlist(lapply(1:n, function(i) TT[ij[[i]]] )) i <- unlist(lapply(1:n, function(i) rep(i, length(ij[[i]])))) j <- unlist(lapply(1:n, function(i) 1:length(ij[[i]]))) X <- xi %*% t(phi) meanFn <- function(s,t){ 0.5*t + 1.5*s + 1.3*s*t} mu <- matrix(meanFn(s = rep(ss, each=length(Tij)), t=rep(Tij, length(ss)) ) , nrow=nrow(X)) Y <- mu + X + matrix(rnorm(nrow(X)*ncol(phi), 0, sigma), nrow=nrow(X)) %*% t(phi) + #correlated error matrix(rnorm(length(X), 0, sigma_wn), nrow=nrow(X)) # white noise matplot(ss, t(Y[which(i==2),]), type='l', ylab="", xlab="functional argument", main="observations from subject i = 2") # END: data generation ########################################################################################### # Illustration I : when covariance of scores from a mFPCA step is estimated using fpca.sc ########################################################################################### est <- fpca.lfda(Y = Y, subject.index = i, visit.index = j, obsT = Tij, funcArg = ss, numTEvalPoints = length(TT), newdata = data.frame(i = c(1:3), Ltime = c(Tij[1], 0.2, 0.5)), fbps.knots = 35, fbps.p = 3, fbps.m = 2, LongiModel.method='fpca.sc', mFPCA.pve = 0.95, mFPCA.knots = 35, mFPCA.p = 3, mFPCA.m = 2, sFPCA.pve = 0.95, sFPCA.nbasis = 10, sFPCA.npc = NULL, gam.method = 'REML', gam.kT = 10) # mean function (true vs. estimated) par(mfrow=c(1,2)) persp(x=TT, y = ss, z= t(sapply(TT, function(a) meanFn(s=ss, t = a))), xlab="visit times", ylab="s", zlab="true mean fn") persp(x = TT, y = ss, est$bivariateSmoothMeanFunc, xlab="visit times", ylab="s", zlab="estimated mean fn", col='light blue') ################ mFPCA step ################ par(mfrow=c(1,2)) # marginal covariance fn (true vs. estimated) image(phi%*%diag(lambdaTrue)%*%t(phi)) image(est$mFPCA.covar) # eigenfunctions (true vs. estimated) matplot(ss, phi, type='l') matlines(ss, cbind(est$mFPCA.efunctions[,1], est$mFPCA.efunctions[,2]), type='l', lwd=2) # scree plot plot(cumsum(est$mFPCA.scree.eval)/sum(est$mFPCA.scree.eval), type='l', ylab = "Percentage of variance explained") points(cumsum(est$mFPCA.scree.eval)/sum(est$mFPCA.scree.eval), pch=16) ################ sFPCA step ################ par(mfrow=c(1,2)) print(est$mFPCA.npc) # k = 2 # covariance of score functions for k = 1 (true vs. estimated) image(psi1%*%diag(eta1True)%*%t(psi1), main='TRUE') image(est$sFPCA.longDynCov.k[[1]], main='ESTIMATED') # covariance of score functions for k = 2 (true vs. estimated) image(psi2%*%diag(eta2True)%*%t(psi2)) image(est$sFPCA.longDynCov.k[[2]]) # estimated scores functions matplot(TT, do.call(cbind,lapply(est$sFPCA.xiHat.bySubj, function(a) a[,1])), xlab="visit time", main="k=1", type='l', ylab="", col=rainbow(100, alpha = 1), lwd=1, lty=1) matplot(TT, do.call(cbind,lapply(est$sFPCA.xiHat.bySubj, function(a) a[,2])), xlab="visit time", main="k=2",type='l', ylab="", col=rainbow(100, alpha = 1), lwd=1, lty=1) ################ In-sample and Out-of-sample Prediction ################ par(mfrow=c(1,2)) # fitted matplot(ss, t(Y[which(i==1),]), type='l', ylab="", xlab="functional argument") matlines(ss, t(est$fitted.values[which(i==1),]), type='l', lwd=2) # sanity check : expect fitted and predicted (obtained using info from newdata) # values to be the same plot(ss, est$fitted.values[1,], type='p', xlab="", ylab="", pch = 1, cex=1) lines(ss, est$predicted.values[1,], type='l', lwd=2, col='blue') all.equal(est$predicted.values[1,], est$fitted.values[1,]) ########################################################################################### # Illustration II : when covariance of scores from a mFPCA step is estimated using lmer ########################################################################################### est.lme <- fpca.lfda(Y = Y, subject.index = i, visit.index = j, obsT = Tij, funcArg = ss, numTEvalPoints = length(TT), newdata = data.frame(i = c(1:3), Ltime = c(Tij[1], 0.2, 0.5)), fbps.knots = 35, fbps.p = 3, fbps.m = 2, LongiModel.method='lme', mFPCA.pve = 0.95, mFPCA.knots = 35, mFPCA.p = 3, mFPCA.m = 2, gam.method = 'REML', gam.kT = 10) par(mfrow=c(2,2)) # fpca.sc vs. lme (assumes linearity) matplot(TT, do.call(cbind,lapply(est$sFPCA.xiHat.bySubj, function(a) a[,1])), xlab="visit time", main="k=1", type='l', ylab="", col=rainbow(100, alpha = 1), lwd=1, lty=1) matplot(TT, do.call(cbind,lapply(est$sFPCA.xiHat.bySubj, function(a) a[,2])), xlab="visit time", main="k=2",type='l', ylab="", col=rainbow(100, alpha = 1), lwd=1, lty=1) matplot(TT, do.call(cbind,lapply(est.lme$sFPCA.xiHat.bySubj, function(a) a[,1])), xlab="visit time", main="k=1", type='l', ylab="", col=rainbow(100, alpha = 1), lwd=1, lty=1) matplot(TT, do.call(cbind,lapply(est.lme$sFPCA.xiHat.bySubj, function(a) a[,2])), xlab="visit time", main="k=2", type='l', ylab="", col=rainbow(100, alpha = 1), lwd=1, lty=1) ## End(Not run)
Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.