Become an expert in R — Interactive courses, Cheat Sheets, certificates and more!
Get Started for Free

fpca.lfda

Longitudinal Functional Data Analysis using FPCA


Description

Implements longitudinal functional data analysis (Park and Staicu, 2015). It decomposes longitudinally-observed functional observations in two steps. It first applies FPCA on a properly defined marginal covariance function and obtain estimated scores (mFPCA step). Then it further models the underlying process dynamics by applying another FPCA on a covariance of the estimated scores obtained in the mFPCA step. The function also allows to use a random effects model to study the underlying process dynamics instead of a KL expansion model in the second step. Scores in mFPCA step are estimated using numerical integration. Scores in sFPCA step are estimated under a mixed model framework.

Usage

fpca.lfda(
  Y,
  subject.index,
  visit.index,
  obsT = NULL,
  funcArg = NULL,
  numTEvalPoints = 41,
  newdata = NULL,
  fbps.knots = c(5, 10),
  fbps.p = 3,
  fbps.m = 2,
  mFPCA.pve = 0.95,
  mFPCA.knots = 35,
  mFPCA.p = 3,
  mFPCA.m = 2,
  mFPCA.npc = NULL,
  LongiModel.method = c("fpca.sc", "lme"),
  sFPCA.pve = 0.95,
  sFPCA.nbasis = 10,
  sFPCA.npc = NULL,
  gam.method = "REML",
  gam.kT = 10
)

Arguments

Y

a matrix of which each row corresponds to one curve observed on a regular and dense grid (dimension of N by m; N = total number of observed functions; m = number of grid points)

subject.index

subject id; vector of length N with each element corresponding a row of Y

visit.index

index for visits (repeated measures); vector of length N with each element corresponding a row of Y

obsT

actual time of visits at which a function is observed; vector of length N with each element corresponding a row of Y

funcArg

numeric; function argument

numTEvalPoints

total number of evaluation time points for visits; used for pre-binning in sFPCA step; defaults to 41

newdata

an optional data frame providing predictors (i for subject id / Ltime for visit time) with which prediction is desired; defaults to NULL

fbps.knots

list of two vectors of knots or number of equidistanct knots for all dimensions for a fast bivariate P-spline smoothing (fbps) method used to estimate a bivariate, smooth mean function; defaults to c(5,10); see fbps

fbps.p

integer;degrees of B-spline functions to use for a fbps method; defaults to 3; see fbps

fbps.m

integer;order of differencing penalty to use for a fbps method; defaults to 2; see fbps

mFPCA.pve

proportion of variance explained for a mFPCA step; used to choose the number of principal components (PCs); defaults to 0.95; see fpca.face

mFPCA.knots

number of knots to use or the vectors of knots in a mFPCA step; used for obtain a smooth estimate of a covariance function; defaults to 35; see fpca.face

mFPCA.p

integer; the degree of B-spline functions to use in a mFPCA step; defaults to 3; see fpca.face

mFPCA.m

integer;order of differencing penalty to use in a mFPCA step; defaults to 2; see fpca.face

mFPCA.npc

pre-specified value for the number of principal components; if given, it overrides pve; defaults to NULL; see fpca.face

LongiModel.method

model and estimation method for estimating covariance of estimated scores from a mFPCA step; either KL expansion model or random effects model; defaults to fpca.sc

sFPCA.pve

proportion of variance explained for sFPCA step; used to choose the number of principal components; defaults to 0.95; see fpca.sc

sFPCA.nbasis

number of B-spline basis functions used in sFPCA step for estimation of the mean function and bivariate smoothing of the covariance surface; defaults to 10; see fpca.sc

sFPCA.npc

pre-specified value for the number of principal components; if given, it overrides pve; defaults to NULL; see fpca.sc

gam.method

smoothing parameter estimation method when gam is used for predicting score functions at unobserved visit time, T; defaults to REML; see gam

gam.kT

dimension of basis functions to use; see gam

Value

A list with components

obsData

observed data (input)

i

subject id

funcArg

function argument

visitTime

visit times

fitted.values

fitted values (in-sample); of the same dimension as Y

fitted.values.all

a list of which each component consists of a subject's fitted values at all pairs of evaluation points (s and T)

predicted.values

predicted values for variables provided in newdata

bivariateSmoothMeanFunc

estimated bivariate smooth mean function

mFPCA.efunctions

estimated eigenfunction in a mFPCA step

mFPCA.evalues

estimated eigenvalues in a mFPCA step

mFPCA.npc

number of principal components selected with pre-specified pve in a mFPCA step

mFPCA.scree.eval

estimated eigenvalues obtained with pre-specified pve = 0.9999; for scree plot

sFPCA.xiHat.bySubj

a list of which each component consists of a subject's predicted score functions evaluated at equidistanced grid in direction of visit time, T

sFPCA.npc

a vector of numbers of principal components selected in a sFPCA step with pre-specified pve; length of mFPCA.npc

mFPCA.covar

estimated marginal covariance

sFPCA.longDynCov.k

a list of estimated covariance of score function; length of mFPCA.npc

Details

A random effects model is recommended when a set of visit times for all subjects and visits is not dense in its range.

Author(s)

So Young Park spark13@ncsu.edu, Ana-Maria Staicu

References

Park, S.Y. and Staicu, A.M. (2015). Longitudinal functional data analysis. Stat 4 212-226.

Examples

## Not run:  
  ########################################
  ###   Illustration with real data    ###
  ########################################   

  data(DTI)
  MS <- subset(DTI, case ==1)  # subset data with multiple sclerosis (MS) case

  index.na <- which(is.na(MS$cca))  
  Y <- MS$cca; Y[index.na] <- fpca.sc(Y)$Yhat[index.na]; sum(is.na(Y))
  id <- MS$ID 
  visit.index <- MS$visit 
  visit.time <- MS$visit.time/max(MS$visit.time)

  lfpca.dti <- fpca.lfda(Y = Y, subject.index = id,  
                         visit.index = visit.index, obsT = visit.time, 
                         LongiModel.method = 'lme',
                         mFPCA.pve = 0.95)
                         
  TT <- seq(0,1,length.out=41); ss = seq(0,1,length.out=93)
  
  # estimated mean function
  persp(x = ss, y = TT, z = t(lfpca.dti$bivariateSmoothMeanFunc),
        xlab="s", ylab="visit times", zlab="estimated mean fn", col='light blue')
        
  # first three estimated marginal eigenfunctions
  matplot(ss, lfpca.dti$mFPCA.efunctions[,1:3], type='l', xlab='s', ylab='estimated eigen fn')
  
  # predicted scores function corresponding to first two marginal PCs
  matplot(TT, do.call(cbind, lapply(lfpca.dti$sFPCA.xiHat.bySubj, function(a) a[,1])),
          xlab="visit time (T)", ylab="xi_hat(T)", main = "k = 1", type='l')
  matplot(TT, do.call(cbind, lapply(lfpca.dti$sFPCA.xiHat.bySubj, function(a) a[,2])),
          xlab="visit time (T)", ylab="xi_hat(T)", main = "k = 2", type='l')

  # prediction of cca of first two subjects at T = 0, 0.5 and 1 (black, red, green)
  matplot(ss, t(lfpca.dti$fitted.values.all[[1]][c(1,21,41),]), 
         type='l', lty = 1, ylab="", xlab="s", main = "Subject = 1")    
  matplot(ss, t(lfpca.dti$fitted.values.all[[2]][c(1,21,41),]), 
         type='l', lty = 1, ylab="", xlab="s", main = "Subject = 2")    
     
  ########################################
  ### Illustration with simulated data ###
  ########################################   

  ###########################################################################################
  # data generation
  ###########################################################################################
  set.seed(1)
  n <- 100 # number of subjects
  ss <- seq(0,1,length.out=101) 
  TT <- seq(0, 1, length.out=41)
  mi <- runif(n, min=6, max=15)
  ij <- sapply(mi, function(a) sort(sample(1:41, size=a, replace=FALSE)))
  
  # error variances
  sigma <- 0.1 
  sigma_wn <- 0.2

  lambdaTrue <- c(1,0.5) # True eigenvalues
  eta1True <- c(0.5, 0.5^2, 0.5^3) # True eigenvalues
  eta2True <- c(0.5^2, 0.5^3) # True eigenvalues
  
  phi <- sqrt(2)*cbind(sin(2*pi*ss),cos(2*pi*ss))
  psi1 <- cbind(rep(1,length(TT)), sqrt(3)*(2*TT-1), sqrt(5)*(6*TT^2-6*TT+1))
  psi2 <- sqrt(2)*cbind(sin(2*pi*TT),cos(2*pi*TT))
  
  zeta1 <- sapply(eta1True, function(a) rnorm(n = n, mean = 0, sd = a))
  zeta2 <- sapply(eta2True, function(a) rnorm(n = n, mean = 0, sd = a))
  
  xi1 <- unlist(lapply(1:n, function(a) (zeta1 %*% t(psi1))[a,ij[[a]]] ))
  xi2 <- unlist(lapply(1:n, function(a) (zeta2 %*% t(psi2))[a,ij[[a]]] ))
  xi <- cbind(xi1, xi2)
  
  Tij <- unlist(lapply(1:n, function(i) TT[ij[[i]]] ))
  i <- unlist(lapply(1:n, function(i) rep(i, length(ij[[i]]))))
  j <- unlist(lapply(1:n, function(i) 1:length(ij[[i]])))
  
  X <- xi %*% t(phi)
  meanFn <- function(s,t){ 0.5*t + 1.5*s + 1.3*s*t}
  mu <- matrix(meanFn(s = rep(ss, each=length(Tij)), t=rep(Tij, length(ss)) ) , nrow=nrow(X))

  Y <- mu +  X + 
     matrix(rnorm(nrow(X)*ncol(phi), 0, sigma), nrow=nrow(X)) %*% t(phi) + #correlated error
     matrix(rnorm(length(X), 0, sigma_wn), nrow=nrow(X)) # white noise

  matplot(ss, t(Y[which(i==2),]), type='l', ylab="", xlab="functional argument", 
         main="observations from subject i = 2")
  # END: data generation
  
  ###########################################################################################
  # Illustration I : when covariance of scores from a mFPCA step is estimated using fpca.sc
  ###########################################################################################
  est <- fpca.lfda(Y = Y, 
                   subject.index = i, visit.index = j, obsT = Tij, 
                   funcArg = ss, numTEvalPoints = length(TT), 
                   newdata = data.frame(i = c(1:3), Ltime = c(Tij[1], 0.2, 0.5)), 
                   fbps.knots = 35, fbps.p = 3, fbps.m = 2,
                   LongiModel.method='fpca.sc',
                   mFPCA.pve = 0.95, mFPCA.knots = 35, mFPCA.p = 3, mFPCA.m = 2, 
                   sFPCA.pve = 0.95, sFPCA.nbasis = 10, sFPCA.npc = NULL,
                   gam.method = 'REML', gam.kT = 10)
  
  
  # mean function (true vs. estimated)
  par(mfrow=c(1,2))
  persp(x=TT, y = ss, z= t(sapply(TT, function(a) meanFn(s=ss, t = a))),
          xlab="visit times", ylab="s", zlab="true mean fn")
  persp(x = TT, y = ss, est$bivariateSmoothMeanFunc,
   xlab="visit times", ylab="s", zlab="estimated mean fn", col='light blue')
  
  ################   mFPCA step   ################
  par(mfrow=c(1,2))
  
  # marginal covariance fn (true vs. estimated)
  image(phi%*%diag(lambdaTrue)%*%t(phi))
  image(est$mFPCA.covar) 
  
  # eigenfunctions (true vs. estimated)
  matplot(ss, phi, type='l') 
  matlines(ss, cbind(est$mFPCA.efunctions[,1], est$mFPCA.efunctions[,2]), type='l', lwd=2)
  
  # scree plot
  plot(cumsum(est$mFPCA.scree.eval)/sum(est$mFPCA.scree.eval), type='l', 
       ylab = "Percentage of variance explained")
  points(cumsum(est$mFPCA.scree.eval)/sum(est$mFPCA.scree.eval), pch=16)
  
  ################   sFPCA step   ################
  par(mfrow=c(1,2))
  print(est$mFPCA.npc)  # k = 2
  
  # covariance of score functions for k = 1 (true vs. estimated)
  image(psi1%*%diag(eta1True)%*%t(psi1), main='TRUE')
  image(est$sFPCA.longDynCov.k[[1]], main='ESTIMATED')
  
  # covariance of score functions for k = 2 (true vs. estimated)
  image(psi2%*%diag(eta2True)%*%t(psi2))
  image(est$sFPCA.longDynCov.k[[2]])
  
  # estimated scores functions
  matplot(TT, do.call(cbind,lapply(est$sFPCA.xiHat.bySubj, function(a) a[,1])), 
          xlab="visit time", main="k=1", type='l', ylab="", col=rainbow(100, alpha = 1), 
          lwd=1, lty=1)
  matplot(TT, do.call(cbind,lapply(est$sFPCA.xiHat.bySubj, function(a) a[,2])), 
          xlab="visit time", main="k=2",type='l', ylab="", col=rainbow(100, alpha = 1), 
          lwd=1, lty=1)
  
  ################   In-sample and Out-of-sample Prediction   ################
  par(mfrow=c(1,2))
  # fitted
  matplot(ss, t(Y[which(i==1),]), type='l', ylab="", xlab="functional argument")
  matlines(ss, t(est$fitted.values[which(i==1),]), type='l', lwd=2)
  
 # sanity check : expect fitted and predicted (obtained using info from newdata) 
 #                values to be the same
 
  plot(ss, est$fitted.values[1,], type='p', xlab="", ylab="", pch = 1, cex=1)
  lines(ss, est$predicted.values[1,], type='l', lwd=2, col='blue')
  all.equal(est$predicted.values[1,], est$fitted.values[1,])
  
  ###########################################################################################
  # Illustration II : when covariance of scores from a mFPCA step is estimated using lmer
  ###########################################################################################
  est.lme <- fpca.lfda(Y = Y, 
                       subject.index = i, visit.index = j, obsT = Tij,
                       funcArg = ss, numTEvalPoints = length(TT), 
                       newdata = data.frame(i = c(1:3), Ltime = c(Tij[1], 0.2, 0.5)), 
                       fbps.knots = 35, fbps.p = 3, fbps.m = 2,
                       LongiModel.method='lme',
                       mFPCA.pve = 0.95, mFPCA.knots = 35, mFPCA.p = 3, mFPCA.m = 2, 
                       gam.method = 'REML', gam.kT = 10)
  
  par(mfrow=c(2,2))
  
  # fpca.sc vs. lme (assumes linearity)
  matplot(TT, do.call(cbind,lapply(est$sFPCA.xiHat.bySubj, function(a) a[,1])), 
          xlab="visit time", main="k=1", type='l', ylab="", col=rainbow(100, alpha = 1), 
          lwd=1, lty=1)
  matplot(TT, do.call(cbind,lapply(est$sFPCA.xiHat.bySubj, function(a) a[,2])), 
          xlab="visit time", main="k=2",type='l', ylab="", col=rainbow(100, alpha = 1), 
          lwd=1, lty=1)
          
  matplot(TT, do.call(cbind,lapply(est.lme$sFPCA.xiHat.bySubj, function(a) a[,1])), 
          xlab="visit time", main="k=1", type='l', ylab="", col=rainbow(100, alpha = 1), 
          lwd=1, lty=1)
  matplot(TT, do.call(cbind,lapply(est.lme$sFPCA.xiHat.bySubj, function(a) a[,2])), 
          xlab="visit time", main="k=2", type='l', ylab="", col=rainbow(100, alpha = 1),
          lwd=1, lty=1)
  
## End(Not run)

refund

Regression with Functional Data

v0.1-23
GPL (>= 2)
Authors
Jeff Goldsmith [aut], Fabian Scheipl [aut], Lei Huang [aut], Julia Wrobel [aut, cre], Chongzhi Di [aut], Jonathan Gellar [aut], Jaroslaw Harezlak [aut], Mathew W. McLean [aut], Bruce Swihart [aut], Luo Xiao [aut], Ciprian Crainiceanu [aut], Philip T. Reiss [aut], Yakuan Chen [ctb], Sonja Greven [ctb], Lan Huo [ctb], Madan Gopal Kundu [ctb], So Young Park [ctb], David L. Miller [ctb], Ana-Maria Staicu [ctb]
Initial release
2020-12-03

We don't support your browser anymore

Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.