Tucker Factor Analysis
Fits Ledyard R. Tucker's factor analysis model to 3-way or 4-way data arrays. Parameters are estimated via alternating least squares.
tucker(X, nfac, nstart = 10, Afixed = NULL, Bfixed = NULL, Cfixed = NULL, Dfixed = NULL, Bstart = NULL, Cstart = NULL, Dstart = NULL, maxit = 500, ctol = 1e-4, parallel = FALSE, cl = NULL, output = c("best", "all"), verbose = TRUE)
X |
Three-way data array with |
nfac |
Number of factors in each mode. |
nstart |
Number of random starts. |
Afixed |
Fixed Mode A weights. Only used to fit model with fixed weights in Mode A. |
Bfixed |
Fixed Mode B weights. Only used to fit model with fixed weights in Mode B. |
Cfixed |
Fixed Mode C weights. Only used to fit model with fixed weights in Mode C. |
Dfixed |
Fixed Mode D weights. Only used to fit model with fixed weights in Mode D. |
Bstart |
Starting Mode B weights for ALS algorithm. Default uses random weights. |
Cstart |
Starting Mode C weights for ALS algorithm. Default uses random weights. |
Dstart |
Starting Mode D weights for ALS algorithm. Default uses random weights. |
maxit |
Maximum number of iterations. |
ctol |
Convergence tolerance. |
parallel |
Logical indicating if |
cl |
Cluster created by |
output |
Output the best solution (default) or output all |
verbose |
If |
Given a 3-way array X = array(x,dim=c(I,J,K))
, the 3-way Tucker model can be written as
X[i,j,k] = sum sum sum A[i,p]*B[j,q]*C[k,r]*G[p,q,r] + E[i,j,k]
|
where A = matrix(a,I,P)
are the Mode A (first mode) weights, B = matrix(b,J,Q)
are the Mode B (second mode) weights, C = matrix(c,K,R)
are the Mode C (third mode) weights, G = array(g,dim=c(P,Q,R))
is the 3-way core array, and E = array(e,dim=c(I,J,K))
is the 3-way residual array. The summations are for p = seq(1,P)
, q = seq(1,Q)
, and r = seq(1,R)
.
Given a 4-way array X = array(x,dim=c(I,J,K,L))
, the 4-way Tucker model can be written as
X[i,j,k,l] = sum sum sum sum A[i,p]*B[j,q]*C[k,r]*D[l,s]*G[p,q,r,s] + E[i,j,k,l]
|
where D = matrix(d,L,S)
are the Mode D (fourth mode) weights, G = array(g,dim=c(P,Q,R,S))
is the 4-way residual array, E = array(e,dim=c(I,J,K,L))
is the 4-way residual array, and the other terms can be interprered as previously described.
Weight matrices are estimated using an alternating least squares algorithm.
If output="best"
, returns an object of class "tucker"
with the following elements:
A |
Mode A weight matrix. |
B |
Mode B weight matrix. |
C |
Mode C weight matrix. |
D |
Mode D weight matrix. |
G |
Core array. |
SSE |
Sum of Squared Errors. |
Rsq |
R-squared value. |
GCV |
Generalized Cross-Validation. |
edf |
Effective degrees of freedom. |
iter |
Number of iterations. |
cflag |
Convergence flag. |
Otherwise returns a list of length nstart
where each element is an object of class "tucker"
.
The ALS algorithm can perform poorly if the number of factors nfac
is set too large.
Input matrices in Afixed
, Bfixed
, Cfixed
, Dfixed
, Bstart
, Cstart
, and Dstart
must be columnwise orthonormal.
Default use is 10 random strarts (nstart=10
) with 500 maximum iterations of the ALS algorithm for each start (maxit=500
) using a convergence tolerance of 1e-4 (ctol=1e-4
). The algorithm is determined to have converged once the change in R^2 is less than or equal to ctol
.
Output cflag
gives convergence information: cflag=0
if ALS algorithm converged normally, and cflag=1
if maximum iteration limit was reached before convergence.
Missing data should be specified as NA
values in the input X
. The missing data are randomly initialized and then iteratively imputed as a part of the ALS algorithm.
Nathaniel E. Helwig <helwig@umn.edu>
Kroonenberg, P. M., & de Leeuw, J. (1980). Principal component analysis of three-mode data by means of alternating least squares algorithms. Psychometrika, 45, 69-97.
Tucker, L. R. (1966). Some mathematical notes on three-mode factor analysis. Psychometrika, 31, 279-311.
########## 3-way example ########## ####****#### TUCKER3 ####****#### # create random data array with Tucker3 structure set.seed(3) mydim <- c(50,20,5) nf <- c(3,2,3) Amat <- matrix(rnorm(mydim[1]*nf[1]), mydim[1], nf[1]) Amat <- svd(Amat, nu = nf[1], nv = 0)$u Bmat <- matrix(rnorm(mydim[2]*nf[2]), mydim[2], nf[2]) Bmat <- svd(Bmat, nu = nf[2], nv = 0)$u Cmat <- matrix(rnorm(mydim[3]*nf[3]), mydim[3], nf[3]) Cmat <- svd(Cmat, nu = nf[3], nv = 0)$u Gmat <- matrix(rnorm(prod(nf)), nf[1], prod(nf[2:3])) Xmat <- tcrossprod(Amat %*% Gmat, kronecker(Cmat, Bmat)) Xmat <- array(Xmat, dim = mydim) Emat <- array(rnorm(prod(mydim)), dim = mydim) Emat <- nscale(Emat, 0, ssnew = sumsq(Xmat)) # SNR=1 X <- Xmat + Emat # fit Tucker3 model tuck <- tucker(X, nfac = nf, nstart = 1) tuck # check solution Xhat <- fitted(tuck) sum((Xmat-Xhat)^2) / prod(mydim) # reorder mode="A" tuck$A[1:4,] tuck$G tuck <- reorder(tuck, neworder = c(3,1,2), mode = "A") tuck$A[1:4,] tuck$G Xhat <- fitted(tuck) sum((Xmat-Xhat)^2)/prod(mydim) # reorder mode="B" tuck$B[1:4,] tuck$G tuck <- reorder(tuck, neworder=2:1, mode="B") tuck$B[1:4,] tuck$G Xhat <- fitted(tuck) sum((Xmat-Xhat)^2)/prod(mydim) # resign mode="C" tuck$C[1:4,] tuck <- resign(tuck, mode="C") tuck$C[1:4,] Xhat <- fitted(tuck) sum((Xmat-Xhat)^2)/prod(mydim) ####****#### TUCKER2 ####****#### # create random data array with Tucker2 structure set.seed(3) mydim <- c(50, 20, 5) nf <- c(3, 2, mydim[3]) Amat <- matrix(rnorm(mydim[1]*nf[1]), mydim[1], nf[1]) Amat <- svd(Amat, nu = nf[1], nv = 0)$u Bmat <- matrix(rnorm(mydim[2]*nf[2]), mydim[2], nf[2]) Bmat <- svd(Bmat, nu = nf[2], nv = 0)$u Cmat <- diag(nf[3]) Gmat <- matrix(rnorm(prod(nf)), nf[1], prod(nf[2:3])) Xmat <- tcrossprod(Amat %*% Gmat, kronecker(Cmat, Bmat)) Xmat <- array(Xmat, dim = mydim) Emat <- array(rnorm(prod(mydim)), dim = mydim) Emat <- nscale(Emat, 0, ssnew = sumsq(Xmat)) # SNR=1 X <- Xmat + Emat # fit Tucker2 model tuck <- tucker(X, nfac = nf, nstart = 1, Cfixed = diag(nf[3])) tuck # check solution Xhat <- fitted(tuck) sum((Xmat-Xhat)^2) / prod(mydim) ####****#### TUCKER1 ####****#### # create random data array with Tucker1 structure set.seed(3) mydim <- c(50, 20, 5) nf <- c(3, mydim[2:3]) Amat <- matrix(rnorm(mydim[1]*nf[1]), mydim[1], nf[1]) Amat <- svd(Amat, nu = nf[1], nv = 0)$u Bmat <- diag(nf[2]) Cmat <- diag(nf[3]) Gmat <- matrix(rnorm(prod(nf)), nf[1], prod(nf[2:3])) Xmat <- tcrossprod(Amat %*% Gmat, kronecker(Cmat, Bmat)) Xmat <- array(Xmat, dim = mydim) Emat <- array(rnorm(prod(mydim)), dim = mydim) Emat <- nscale(Emat, 0, ssnew = sumsq(Xmat)) # SNR=1 X <- Xmat + Emat # fit Tucker1 model tuck <- tucker(X, nfac = nf, nstart = 1, Bfixed = diag(nf[2]), Cfixed = diag(nf[3])) tuck # check solution Xhat <- fitted(tuck) sum((Xmat-Xhat)^2) / prod(mydim) # closed-form Tucker1 solution via SVD tsvd <- svd(matrix(X, nrow = mydim[1]), nu = nf[1], nv = nf[1]) Gmat0 <- t(tsvd$v %*% diag(tsvd$d[1:nf[1]])) Xhat0 <- array(tsvd$u %*% Gmat0, dim = mydim) sum((Xmat-Xhat0)^2) / prod(mydim) # get Mode A weights and core array tuck0 <- NULL tuck0$A <- tsvd$u # A weights tuck0$G <- array(Gmat0, dim = nf) # core array ########## 4-way example ########## # create random data array with Tucker structure set.seed(4) mydim <- c(30,10,8,10) nf <- c(2,3,4,3) Amat <- svd(matrix(rnorm(mydim[1]*nf[1]),mydim[1],nf[1]),nu=nf[1])$u Bmat <- svd(matrix(rnorm(mydim[2]*nf[2]),mydim[2],nf[2]),nu=nf[2])$u Cmat <- svd(matrix(rnorm(mydim[3]*nf[3]),mydim[3],nf[3]),nu=nf[3])$u Dmat <- svd(matrix(rnorm(mydim[4]*nf[4]),mydim[4],nf[4]),nu=nf[4])$u Gmat <- array(rnorm(prod(nf)),dim=nf) Xmat <- array(tcrossprod(Amat%*%matrix(Gmat,nf[1],prod(nf[2:4])), kronecker(Dmat,kronecker(Cmat,Bmat))),dim=mydim) Emat <- array(rnorm(prod(mydim)),dim=mydim) Emat <- nscale(Emat, 0, ssnew = sumsq(Xmat)) # SNR=1 X <- Xmat + Emat # fit Tucker model tuck <- tucker(X,nfac=nf,nstart=1) tuck # check solution Xhat <- fitted(tuck) sum((Xmat-Xhat)^2)/prod(mydim) ## Not run: ########## parallel computation ########## # create random data array with Tucker structure set.seed(3) mydim <- c(50,20,5) nf <- c(3,2,3) Amat <- svd(matrix(rnorm(mydim[1]*nf[1]),mydim[1],nf[1]),nu=nf[1])$u Bmat <- svd(matrix(rnorm(mydim[2]*nf[2]),mydim[2],nf[2]),nu=nf[2])$u Cmat <- svd(matrix(rnorm(mydim[3]*nf[3]),mydim[3],nf[3]),nu=nf[3])$u Gmat <- array(rnorm(prod(nf)),dim=nf) Xmat <- array(tcrossprod(Amat%*%matrix(Gmat,nf[1],nf[2]*nf[3]),kronecker(Cmat,Bmat)),dim=mydim) Emat <- array(rnorm(prod(mydim)),dim=mydim) Emat <- nscale(Emat, 0, ssnew = sumsq(Xmat)) # SNR=1 X <- Xmat + Emat # fit Tucker model (10 random starts -- sequential computation) set.seed(1) system.time({tuck <- tucker(X,nfac=nf)}) tuck$Rsq # fit Tucker model (10 random starts -- parallel computation) cl <- makeCluster(detectCores()) ce <- clusterEvalQ(cl,library(multiway)) clusterSetRNGStream(cl, 1) system.time({tuck <- tucker(X,nfac=nf,parallel=TRUE,cl=cl)}) tuck$Rsq stopCluster(cl) ## End(Not run)
Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.