TAM: tam.mml.3pl – R documentation

Become an expert in R — Interactive courses, Cheat Sheets, certificates and more!

tam.mml.3pl

3PL Structured Item Response Model in TAM

Description

This estimates a 3PL model with design matrices for item slopes and item intercepts. Discrete distributions of the latent variables are allowed which can be log-linearly smoothed.

Usage

tam.mml.3pl(resp, Y=NULL, group=NULL, formulaY=NULL, dataY=NULL, ndim=1,
  pid=NULL, xsi.fixed=NULL, xsi.inits=NULL, xsi.prior=NULL,
  beta.fixed=NULL, beta.inits=NULL, variance.fixed=NULL, variance.inits=NULL,
  est.variance=TRUE, A=NULL, notA=FALSE, Q=NULL, Q.fixed=NULL, E=NULL,
  gammaslope.des="2PL", gammaslope=NULL, gammaslope.fixed=NULL,
  est.some.slopes=TRUE, gammaslope.max=9.99, gammaslope.constr.V=NULL,
  gammaslope.constr.c=NULL, gammaslope.center.index=NULL,  gammaslope.center.value=NULL,
  gammaslope.prior=NULL, userfct.gammaslope=NULL, gammaslope.constr.Npars=0,
  est.guess=NULL, guess=rep(0, ncol(resp)),
  guess.prior=NULL, max.guess=0.5, skillspace="normal", theta.k=NULL,
  delta.designmatrix=NULL, delta.fixed=NULL, delta.inits=NULL, pweights=NULL,
  item.elim=TRUE, verbose=TRUE, control=list(), Edes=NULL )

## S3 method for class 'tam.mml.3pl'
summary(object,file=NULL,...)

## S3 method for class 'tam.mml.3pl'
print(x,...)

Arguments

`resp`	Data frame with polytomous item responses k=0,...,K. Missing responses must be declared as `NA`.
`Y`	A matrix of covariates in latent regression. Note that the intercept is automatically included as the first predictor.
`group`	An optional vector of group identifiers
`formulaY`	An R formula for latent regression. Transformations of predictors in Y (included in `dataY`) can be easily specified, e. g. `female*race` or `I(age^2)`.
`dataY`	An optional data frame with possible covariates Y in latent regression. This data frame will be used if an R formula in `formulaY` is specified.
`ndim`	Number of dimensions (is not needed to determined by the user)
`pid`	An optional vector of person identifiers
`xsi.fixed`	A matrix with two columns for fixing ξ parameters. 1st column: index of ξ parameter, 2nd column: fixed value
`xsi.inits`	A matrix with two columns (in the same way defined as in `xsi.fixed` with initial value for ξ.
`xsi.prior`	Item-specific prior distribution for ξ parameters. It is assumed that ξ_k \sim N( μ_k, σ_k^2 ). The first column in `xsi.prior` is μ_k, the second is σ_k.
`beta.fixed`	A matrix with three columns for fixing regression coefficients. 1st column: Index of Y value, 2nd column: dimension, 3rd column: fixed β value. If no constraints should be imposed on β, then set `beta.fixed=FALSE` (see Example 2, Model `2_4`).
`beta.inits`	A matrix (same format as in `beta.fixed`) with initial β values
`variance.fixed`	An optional matrix. In case of a single group it is a matrix with three columns for fixing entries in covariance matrix: 1st column: dimension 1, 2nd column: dimension 2, 3rd column: fixed value of covariance/variance. In case of multiple groups, it is a matrix with four columns 1st column: group index (from 1,…,G, 2nd column: dimension 1, 3rd column: dimension 2, 4th column: fixed value of covariance
`variance.inits`	Initial covariance matrix in estimation. All matrix entries have to be specified and this matrix is NOT in the same format like `variance.fixed`.
`est.variance`	Should the covariance matrix be estimated? This argument applies to estimated item slopes in `tam.mml.2pl`. The default is `FALSE` which means that latent variables (in the first group) are standardized in 2PL estimation.
`A`	An optional array of dimension I \times (K+1) \times N_ξ. Only ξ parameters are estimated, entries in A only correspond to the design.
`notA`	An optional logical indicating whether all entries in the A matrix are set to zero and no item intercept ξ should be estimated.
`Q`	An optional I \times D matrix (the Q-matrix) which specifies the loading structure of items on dimensions.
`Q.fixed`	Optional I \times D matrix of the same dimensions like `Q`. Non `NA` entries contain values at which item loadings should be fixed to.
`E`	Optional design array for item slopes γ. It is a four dimensional array of size I \times (K+1) \times D \times N_γ containing items, categories, dimensions, γ parameter.
`gammaslope.des`	Optional string indicating type of item slope parameter to be estimated. `gammaslope.des="2PL"` estimates a slope parameter for an item, `gammaslope.des="2PLcat"` for an item and a
`gammaslope`	Initial or fixed vector of γ parameters
`gammaslope.fixed`	An optional matrix containing fixed values in the γ vector. First column: parameter index; second colunmn: fixed value.
`est.some.slopes`	An optional logical indicating whether some item slopes should be estimated.
`gammaslope.max`	Value for absolute entries in γ vector
`gammaslope.constr.V`	An optional constraint matrix V for item slope parameters γ
`gammaslope.constr.c`	An optional constraint vector c for item slope parameters γ
`gammaslope.center.index`	Indices of `gammaslope` parameters which should be fixed to sum specified in `gammaslope.center.value` (see Example 7).
`gammaslope.center.value`	Specified values of sum of subset of `gammaslope` parameters.
`gammaslope.prior`	Item-specific prior distribution for γ parameters. It is assumed that γ_k \sim N( μ_k, σ_k^2 ). The first column in `gammaslope.prior` is μ_k, the second is σ_k.
`userfct.gammaslope`	A user specified function for constraints or transformations of the γ parameters within the algorithm. See Example 17 in `tam.mml`.
`gammaslope.constr.Npars`	Number of constrained γ parameters in `userfct.gammaslope`
`est.guess`	An optional vector of integers indicating for which items a guessing parameter should be estimated. Same integers correspond to same estimated guessing parameters. A value of 0 denotes an item for which no guessing parameter is estimated.
`guess`	Fixed or initial guessing parameters
`guess.prior`	Item-specific prior distribution for guessing parameters c_i. It is assumed that c_i \sim Beta(a_i, b_i). The first column in `gammaslope.prior` is a_i, the second is b_i.
`max.guess`	Upper bound for guessing parameters
`skillspace`	Skill space: normal distribution (`"normal"`) or discrete distribution (`"discrete"`).
`theta.k`	A matrix of the \bold{θ} skill space in case of a discrete distribution (`skillspace="discrete"`).
`delta.designmatrix`	A design matrix of the log-linear model for the skill space in case of a discrete distribution (`skillspace="discrete"`).
`delta.fixed`	Fixed δ values of the log-linear skill space. `delta.fixed` must be a matrix with three columns. First column: δ parameter index, Second column: Group index, Third column: Fixed δ parameter value.
`delta.inits`	Optional initial matrix of δ parameters.
`pweights`	Optional vector of person weights.
`item.elim`	Optional logical indicating whether an item with has only zero entries should be removed from the analysis. The default is `TRUE`.
`verbose`	Logical indicating whether output should be printed during iterations. This argument replaces `control$progress`.
`control`	See `tam.mml` for more details.
`Edes`	Compact form of design matrix. This argument is only defined for convenience for models with random starting values to avoid recalculations.
`object`	Object of class `tam.mml.3pl`
`file`	A file name in which the summary output will be written
`x`	Object of class `tam.mml.3pl`
`...`	Further arguments to be passed

Details

The item response model for item i and category h and no guessing parameters can be written as

P( X_{i}=h | \bold{θ} ) \propto \exp( ∑_d b_{ihd} θ_d + ∑_k a_{ih} ξ_k )

For item slopes, a linear decomposition is allowed

b_{ihd}=∑_k e_{ihdk} γ_k

In case of a guessing parameter, the item response function is

P( X_{i}=h | \bold{θ} )=c_i + ( 1 - c_i ) \cdot ( 1 + \exp( - ∑_d b_{ihd} θ_d - ∑_k a_{ih} ξ_k ) )^{-1}

For possibilities of definitions of the design matrix E=(e_{ihdk}) see Formann (2007), for the strongly related linear logistic latent class model see Formann (1992). Linear constraints on γ can be specified by equations V γ=c and using the arguments gammaslope.constr.V and gammaslope.constr.c.

Like in tam.mml, a multivariate linear regression

\bold{θ}=Y β + \bold{ε}

assuming a multivariate normal distribution on the residuals \bold{ε} can be specified (skillspace="normal").

Alternatively, a log-linear distribution of the skill classes P(θ) (skillspace="discrete") is performed

\log P(θ )=D_{ δ } δ

See Xu and von Davier (2008). The design matrix D_{δ} can be specified in delta.designmatrix. The theta grid \bold{θ} of the skill space can be defined in theta.k.

Value

The same output as in tam.mml and additional entries

`delta`	Parameter vector δ
`gammaslope`	Estimated γ item slope parameters
`se.gammaslope`	Standard errors γ item slope parameters
`E`	Used design matrix E
`Edes`	Used design matrix E in compact form
`guess`	Estimated c guessing parameters
`se.guess`	Standard errors c guessing parameters

Note

The implementation of the model builds on pieces work of Anton Formann. See http://www.antonformann.at/ for more information.

References

Formann, A. K. (1992). Linear logistic latent class analysis for polytomous data. Journal of the American Statistical Association, 87, 476-486. doi: 10.2307/2290280

Formann, A. K. (2007). (Almost) Equivalence between conditional and mixture maximum likelihood estimates for some models of the Rasch type. In M. von Davier & C. H. Carstensen (Eds.), Multivariate and mixture distribution Rasch models (pp. 177-189). New York: Springer. doi: 10.1007/978-0-387-49839-3_11

Xu, X., & von Davier, M. (2008). Fitting the structured general diagnostic model to NAEP data. ETS Research Report ETS RR-08-27. Princeton, ETS. doi: 10.1002/j.2333-8504.2008.tb02113.x

Examples

## Not run: 
#############################################################################
# EXAMPLE 1: Dichotomous data | data.sim.rasch
#############################################################################

data(data.sim.rasch)
dat <- data.sim.rasch
# some control arguments
ctl.list <- list(maxiter=100) # increase the number of iterations in applications!

#*** Model 1: Rasch model, normal trait distribution
mod1 <- TAM::tam.mml.3pl(resp=dat, skillspace="normal", est.some.slopes=FALSE,
              control=ctl.list)
summary(mod1)

#*** Model 2: Rasch model, discrete trait distribution
#  choose theta grid
theta.k <- seq( -3, 3, len=7 )    # discrete theta grid distribution
# define symmetric trait distribution
delta.designmatrix <- matrix( 0, nrow=7, ncol=4 )
delta.designmatrix[4,1] <- 1
delta.designmatrix[c(3,5),2] <- 1
delta.designmatrix[c(2,6),3] <- 1
delta.designmatrix[c(1,7),4] <- 1
mod2 <- TAM::tam.mml.3pl(resp=dat, skillspace="discrete", est.some.slopes=FALSE,
           theta.k=theta.k, delta.designmatrix=delta.designmatrix, control=ctl.list)
summary(mod2)

#*** Model 3: 2PL model
mod3 <- TAM::tam.mml.3pl(resp=dat, skillspace="normal", gammaslope.des="2PL",
       control=ctl.list, est.variance=FALSE )
summary(mod3)

#*** Model 4: 3PL model
# estimate guessing parameters for items 3,7,9 and 12
I <- ncol(dat)
est.guess <- rep(0, I )
# set parameters 9 and 12 equal -> same integers
est.guess[ c(3,7,9,12) ] <- c( 1, 3, 2, 2 )
# starting values guessing parameter
guess <- .2*(est.guess > 0)
# estimate model
mod4 <- TAM::tam.mml.3pl(resp=dat, skillspace="normal", gammaslope.des="2PL",
        control=ctl.list, est.guess=est.guess, guess=guess, est.variance=FALSE)
summary(mod4)

#--- specification in tamaan
tammodel <- "
LAVAAN MODEL:
  F1=~ I1__I40
  F1 ~~ 1*F1
  I3 + I7 ?=g1
  I9 + I12 ?=g912 * g1
    "
mod4a <- TAM::tamaan( tammodel, resp=dat, control=list(maxiter=20))
summary(mod4a)

#*** Model 5: 3PL model, add some prior Beta distribution
guess.prior <- matrix( 0, nrow=I, ncol=2 )
guess.prior[ est.guess  > 0, 1] <- 5
guess.prior[ est.guess  > 0, 2] <- 17
mod5 <- TAM::tam.mml.3pl(resp=dat, skillspace="normal", gammaslope.des="2PL",
        control=ctl.list, est.guess=est.guess, guess=guess, guess.prior=guess.prior)
summary(mod5)

#--- specification in tamaan
tammodel <- "
LAVAAN MODEL:
  F1=~ I1__I40
  F1 ~~ 1*F1
  I3 + I7 ?=g1
  I9 + I12 ?=g912 * g1
MODEL PRIOR:
  g912 ~ Beta(5,17)
  I3_guess ~ Beta(5,17)
  I7_guess ~ Beta(5,17)
    "
mod5a <- TAM::tamaan( tammodel, resp=dat, control=list(maxiter=20))

#*** Model 6: 2PL model with design matrix for item slopes
I <- 40         # number of items
D <- 1       # dimensions
maxK <- 2    # maximum number of categories
Ngam <- 13   # number of different slope parameters
E <- array( 0, dim=c(I,maxK,D,Ngam) )
# joint slope parameters for items 1 to 10, 11 to 20, 21 to 30
E[ 1:10, 2, 1, 2 ] <- 1
E[ 11:20, 2, 1, 1 ] <- 1
E[ 21:30, 2, 1, 3 ] <- 1
for (ii in 31:40){   E[ii,2,1,ii - 27 ] <- 1 }
# estimate model
mod6 <- TAM::tam.mml.3pl(resp=dat, control=ctl.list,   E=E, est.variance=FALSE  )
summary(mod6)

#*** Model 6b: Truncated normal prior distribution for slope parameters
gammaslope.prior <- matrix( 0, nrow=Ngam, ncol=4 )
gammaslope.prior[,1] <- 2  # mean
gammaslope.prior[,2] <- 10  # standard deviation
gammaslope.prior[,3] <- -Inf # lower bound
gammaslope.prior[ 4:13,3] <- 1.2
gammaslope.prior[,4] <- Inf  # upper bound
# estimate model
mod6b <- TAM::tam.mml.3pl(resp=dat, E=E, est.variance=FALSE,
                gammaslope.prior=gammaslope.prior, control=ctl.list )
summary(mod6b)

#*** Model 7: 2PL model with design matrix of slopes and slope constraints
Ngam <- dim(E)[4]   # number of gamma parameters
# define two constraint equations
gammaslope.constr.V <- matrix( 0, nrow=Ngam, ncol=2 )
gammaslope.constr.c <- rep(0,2)
# set sum of first two xlambda entries to 1.8
gammaslope.constr.V[1:2,1] <- 1
gammaslope.constr.c[1] <- 1.8
# set sum of entries 4, 5 and 6 to 3
gammaslope.constr.V[4:6,2] <- 1
gammaslope.constr.c[2] <- 3
mod7 <- TAM::tam.mml.3pl(resp=dat, control=ctl.list,  E=E,  est.variance=FALSE,
   gammaslope.constr.V=gammaslope.constr.V, gammaslope.constr.c=gammaslope.constr.c)
summary(mod7)

#**** Model 8: Located latent class Rasch model with estimated three skill points

# three classes of theta's are estimated
TP <- 3
theta.k <- diag(TP)
# because item difficulties are unrestricted, we define the sum of the estimated
# theta points equal to zero
Ngam <- TP  # estimate three gamma loading parameters which are discrete theta points
E <- array( 0, dim=c(I,2,TP,Ngam) )
E[,2,1,1] <- E[,2,2,2] <- E[,2,3,3] <- 1
gammaslope.constr.V <- matrix( 1, nrow=3, ncol=1 )
gammaslope.constr.c <- c(0)
# initial gamma values
gammaslope <- c( -2, 0, 2 )
# estimate model
mod8 <- TAM::tam.mml.3pl(resp=dat, control=ctl.list,  E=E,  skillspace="discrete",
     theta.k=theta.k, gammaslope=gammaslope, gammaslope.constr.V=gammaslope.constr.V,
     gammaslope.constr.c=gammaslope.constr.c )
summary(mod8)

#*** Model 9: Multidimensional multiple group model
N <- nrow(dat)
I <- ncol(dat)
group <- c( rep(1,N/4), rep(2,N/4), rep(3,N/2) )
Q <- matrix(0,nrow=I,ncol=2)
Q[ 1:(I/2), 1] <- Q[ seq(I/2+1,I), 2] <- 1
# estimate model
mod9 <- TAM::tam.mml.3pl(resp=dat, skillspace="normal", est.some.slopes=FALSE,
               group=group, Q=Q)
summary(mod9)

#############################################################################
# EXAMPLE 2: Polytomous data
#############################################################################

data( data.mg, package="CDM")
dat <- data.mg[1:1000, paste0("I",1:11)]

#*******************************************************
#*** Model 1: 1-dimensional 1PL estimation, normal skill distribution
mod1 <- TAM::tam.mml.3pl(resp=dat, skillspace="normal",
           gammaslope.des="2PL", est.some.slopes=FALSE,  est.variance=TRUE  )
summary(mod1)

#*******************************************************
#*** Model 2: 1-dimensional 2PL estimation, discrete skill distribution
# define skill space
theta.k <- matrix( seq(-5,5,len=21) )
# allow skew skill distribution
delta.designmatrix <- cbind( 1, theta.k, theta.k^2, theta.k^3 )
# fix 13th xsi item parameter to zero
xsi.fixed <- cbind( 13, 0 )
# fix 10th slope paremeter to one
gammaslope.fixed <- cbind( 10, 1 )
# estimate model
mod2 <- TAM::tam.mml.3pl(resp=dat, skillspace="discrete", theta.k=theta.k,
          delta.designmatrix=delta.designmatrix, gammaslope.des="2PL", xsi.fixed=xsi.fixed,
          gammaslope.fixed=gammaslope.fixed)
summary(mod2)

#*******************************************************
#*** Model 3: 2-dimensional 2PL estimation, normal skill distribution

# define loading matrix
Q <- matrix(0,11,2)
Q[1:6,1] <- 1   # items 1 to 6 load on dimension 1
Q[7:11,2] <- 1  # items 7 to 11 load on dimension 2
# estimate model
mod3 <- TAM::tam.mml.3pl(resp=dat, gammaslope.des="2PL", Q=Q )
summary(mod3)

#############################################################################
# EXAMPLE 3: Dichotomous data with guessing
#############################################################################

#*** simulate data
set.seed(9765)
N <- 4000   # number of persons
I <- 20     # number of items
b <- seq( -1.5, 1.5, len=I )
guess <- rep(0, I )
guess.items <- c(6,11,16)
guess[ guess.items ] <- .33
library(sirt)
dat <- sirt::sim.raschtype( stats::rnorm(N), b=b, fixed.c=guess )

#*******************************************************
#*** Model 1: Difficulty + guessing model, i.e. fix slopes to 1
est.guess <- rep(0,I)
est.guess[ guess.items ] <- seq(1, length(guess.items) )
# define prior distribution
guess.prior <- matrix( cbind( 5, 17 ), I, 2, byrow=TRUE )
guess.prior[ ! est.guess, ] <- 0
# estimate model
mod1 <- TAM::tam.mml.3pl(resp=dat, guess=guess, est.guess=est.guess,
           guess.prior=guess.prior, control=ctl.list,est.variance=TRUE,
           est.some.slopes=FALSE )
summary(mod1)

#*******************************************************
#*** Model 2: estimate a joint guessing parameter
est.guess <- rep(0,I)
est.guess[ guess.items ] <- 1
# estimate model
mod2 <- TAM::tam.mml.3pl(resp=dat, guess=guess, est.guess=est.guess,
            guess.prior=guess.prior, control=ctl.list,est.variance=TRUE,
            est.some.slopes=FALSE )
summary(mod2)

#############################################################################
# EXAMPLE 4: Latent class model with two classes
#      See slca Simulated Example 2 in the CDM package
#############################################################################

#*******************************************************
#*** simulate data
set.seed(9876)
I <- 7  # number of items
# simulate response probabilities
a1 <- round( stats::runif(I,0, .4 ),4)
a2 <- round( stats::runif(I, .6, 1 ),4)
N <- 1000   # sample size
# simulate data in two classes of proportions .3 and .7
N1 <- round(.3*N)
dat1 <- 1 * ( matrix(a1,N1,I,byrow=TRUE) > matrix( stats::runif( N1 * I), N1, I ) )
N2 <- round(.7*N)
dat2 <- 1 * ( matrix(a2,N2,I,byrow=TRUE) > matrix( stats::runif( N2 * I), N2, I ) )
dat <- rbind( dat1, dat2 )
colnames(dat) <- paste0("I", 1:I)

#*******************************************************
# estimation using tam.mml.3pl function

# define design matrices
TP <- 2     # two classes
theta.k <- diag(TP)     # there are theta dimensions -> two classes
# The idea is that latent classes refer to two different "dimensions".
# Items load on latent class indicators 1 and 2, see below.
E <- array(0, dim=c(I,2,2,2*I) )
items <- colnames(dat)
dimnames(E)[[4]] <- c(paste0( colnames(dat), "Class", 1),
          paste0( colnames(dat), "Class", 2) )
# items, categories, classes, parameters
# probabilities for correct solution
for (ii in 1:I){
    E[ ii, 2, 1, ii ] <- 1    # probabilities class 1
    E[ ii, 2, 2, ii+I ] <- 1  # probabilities class 2
                    }

# estimation command
mod1 <- TAM::tam.mml.3pl(resp=dat, E=E, control=list(maxit=20), skillspace="discrete",
          theta.k=theta.k, notA=TRUE)
summary(mod1)
# compare simulated and estimated data
cbind( mod1$rprobs[,2,1], a2 )  # Simulated class 2
cbind( mod1$rprobs[,2,2], a1 )  # Simulated class 1

#*******************************************************
#** specification with tamaan
tammodel <- "
ANALYSIS:
  TYPE=LCA;
  NCLASSES(2);   # 2 classes
  NSTARTS(5,20); # 5 random starts with 20 iterations
LAVAAN MODEL:
  F=~ I1__I7
    "
mod1b <- TAM::tamaan( tammodel, resp=dat )
summary(mod1b)
# compare with mod1
logLik(mod1)
logLik(mod1b)

#############################################################################
# EXAMPLE 5: Located latent class model, Rasch model
#      See slca Simulated Example 4 in the CDM package
#############################################################################

#*** simulate data
set.seed(487)
I <- 15  # I items
b1 <- seq( -2, 2, len=I)      # item difficulties
N <- 2000    # number of persons
# simulate 4 theta classes
theta0 <- c( -2.5, -1, 0.3, 1.3 )  # skill classes
probs0 <- c( .1, .4, .2, .3 )      # skill class probabilities
TP <- length(theta0)
theta <- theta0[ rep(1:TP, round(probs0*N)  ) ]
library(sirt)
dat <- sirt::sim.raschtype( theta, b1 )
colnames(dat) <- paste0("I",1:I)

#*******************************************************
#*** Model 1: Located latent class model with 4 classes
maxK <- 2
Ngam <- TP
E <- array( 0, dim=c(I, maxK, TP,  Ngam ) )
dimnames(E)[[1]] <- colnames(dat)
dimnames(E)[[2]] <- paste0("Cat", 1:(maxK) )
dimnames(E)[[3]] <- paste0("Class", 1:TP)
dimnames(E)[[4]] <- paste0("theta", 1:TP)
# theta design
for (tt in 1:TP){   E[1:I, 2, tt,  tt] <- 1       }
theta.k <- diag(TP)
# set eighth item difficulty to zero
xsi.fixed <- cbind( 8, 0 )
# initial gamma parameter
gammaslope <- seq( -1.5, 1.5, len=TP)
# estimate model
mod1 <- TAM::tam.mml.3pl(resp=dat, E=E, xsi.fixed=xsi.fixed,
           control=list(maxiter=100), skillspace="discrete",
           theta.k=theta.k, gammaslope=gammaslope)
summary(mod1)
# compare estimated and simulated theta class locations
cbind( mod1$gammaslope, theta0 )
# compare estimated and simulated latent class proportions
cbind( mod1$pi.k, probs0 )

#----- specification using tamaan
tammodel <- "
ANALYSIS:
  TYPE=LOCLCA;
  NCLASSES(4)
LAVAAN MODEL:
  F=~ I1__I15
  I8 | 0*t1
    "
mod1b <- TAM::tamaan( tammodel, resp=dat )
summary(mod1b)

#############################################################################
# EXAMPLE 6: DINA model with two skills
#      See slca Simulated Example 5 in the CDM package
#############################################################################

#*** simulate data
set.seed(487)
N <- 3000   # number of persons
# define Q-matrix
I <- 9  # 9 items
NS <- 2 # 2 skills
TP <- 4 # number of skill classes
Q <- scan(nlines=3, text=
  "1 0   1 0   1 0
   0 1   0 1   0 1
   1 1   1 1   1 1",
   )
Q <- matrix(Q, I, ncol=NS, byrow=TRUE)
# define skill distribution
alpha0 <- matrix( c(0,0,1,0,0,1,1,1), nrow=4,ncol=2,byrow=TRUE)
prob0 <- c( .2, .4, .1, .3 )
alpha <- alpha0[ rep( 1:TP, prob0*N),]
# define guessing and slipping parameters
guess <- round( stats::runif(I, 0, .4 ), 2 )
slip <- round( stats::runif(I, 0, .3 ), 2 )
# simulate data according to the DINA model
dat <- CDM::sim.din( q.matrix=Q, alpha=alpha, slip=slip, guess=guess )$dat

#*** Model 1: Estimate DINA model
# define E matrix which contains the anti-slipping parameters
maxK <- 2
Ngam <- I
E <- array( 0, dim=c(I, maxK, TP,  Ngam ) )
dimnames(E)[[1]] <- colnames(dat)
dimnames(E)[[2]] <- paste0("Cat", 1:(maxK) )
dimnames(E)[[3]] <- c("S00","S10","S01","S11")
dimnames(E)[[4]] <- paste0( "antislip", 1:I )
# define anti-slipping parameters in E
for (ii in 1:I){
        # define latent responses
        latresp <- 1*( alpha0 %*% Q[ii,]==sum(Q[ii,]) )[,1]
        # model slipping parameters
        E[ii, 2, latresp==1, ii ] <- 1
                 }
# skill space definition
theta.k <- diag(TP)
gammaslope <- rep( qlogis( .8 ), I )

# estimate model
mod1 <- TAM::tam.mml.3pl(resp=dat, E=E, control=list(maxiter=100), skillspace="discrete",
          theta.k=theta.k, gammaslope=gammaslope)
summary(mod1)
# compare estimated and simulated latent class proportions
cbind( mod1$pi.k, probs0 )
# compare estimated and simulated guessing parameters
cbind( mod1$rprobs[,2,1], guess )
# compare estimated and simulated slipping parameters
cbind( 1 - mod1$rprobs[,2,4], slip )

#############################################################################
# EXAMPLE 7: Mixed Rasch model with two classes
#      See slca Simulated Example 3 in the CDM package
#############################################################################

#*** simulate data
set.seed(987)
library(sirt)
# simulate two latent classes of Rasch populations
I <- 15  # 6 items
b1 <- seq( -1.5, 1.5, len=I)      # difficulties latent class 1
b2 <- b1    # difficulties latent class 2
b2[ c(4,7, 9, 11, 12, 13) ] <- c(1, -.5, -.5, .33, .33, -.66 )
b2 <- b2 - mean(b2)
N <- 3000    # number of persons
wgt <- .25       # class probability for class 1
# class 1
dat1 <- sirt::sim.raschtype( stats::rnorm( wgt*N ), - b1 )
# class 2
dat2 <- sirt::sim.raschtype( stats::rnorm( (1-wgt)*N, mean=1, sd=1.7), - b2 )
dat <- rbind( dat1, dat2 )

# The idea is that each grid point class x theta is defined as new
# dimension. If we approximate the trait distribution by 7 theta points
# and are interested in estimating 2 latent classes, then we need
# 7*2=14 dimensions.

#*** Model 1: Rasch model
# theta grid
theta.k1 <- seq( -5, 5, len=7 )
TT <- length(theta.k1)
#-- define theta design matrix
theta.k <- diag(TT)
#-- delta designmatrix
delta.designmatrix <- matrix( 0, TT, ncol=3 )
delta.designmatrix[, 1] <- 1
delta.designmatrix[, 2:3] <- cbind( theta.k1, theta.k1^2 )

#-- define loading matrix E
E <- array( 0, dim=c(I,2,TT,I + 1) )  # last parameter is constant 1
for (ii in 1:I){
    E[ ii, 2, 1:TT, ii ] <- -1   # '-b' in '1*theta - b'
    E[ ii, 2, 1:TT, I+1] <- theta.k1  # '1*theta' in '1*theta - b'
                }
# initial gammaslope parameters
par1 <- stats::qlogis( colMeans( dat ) )
gammaslope <- c( par1, 1 )
# sum constraint of zero on item difficulties
gammaslope.constr.V <- matrix( 0, I+1, 1 )
gammaslope.constr.V[ 1:I, 1] <- 1  # Class 1
gammaslope.constr.c <- c(0)
# fixed gammaslope parameter
gammaslope.fixed <- cbind( I+1, 1 )
# estimate model
mod1 <- TAM::tam.mml.3pl(resp=dat1, E=E, skillspace="discrete",
           theta.k=theta.k, delta.designmatrix=delta.designmatrix,
           gammaslope=gammaslope, gammaslope.constr.V=gammaslope.constr.V,
           gammaslope.constr.c=gammaslope.constr.c, gammaslope.fixed=gammaslope.fixed,
           notA=TRUE, est.variance=FALSE)
summary(mod1)

#*** Model 2: Mixed Rasch model with two latent classes
# theta grid
theta.k1 <- seq( -4, 4, len=7 )
TT <- length(theta.k1)
#-- define theta design matrix
theta.k <- diag(2*TT)   # 2*7=14 classes
#-- delta designmatrix
delta.designmatrix <- matrix( 0, 2*TT, ncol=6 )
# Class 1
delta.designmatrix[1:TT, 1] <- 1
delta.designmatrix[1:TT, 2:3] <- cbind( theta.k1, theta.k1^2 )
# Class 2
delta.designmatrix[TT+1:TT, 4] <- 1
delta.designmatrix[TT+1:TT, 5:6] <- cbind( theta.k1, theta.k1^2 )

#-- define loading matrix E
E <- array( 0, dim=c(I,2,2*TT,2*I + 1) )  # last parameter is constant 1
dimnames(E)[[1]] <- colnames(dat)
dimnames(E)[[2]] <- c("Cat0","Cat1")
dimnames(E)[[3]] <- c( paste0("Class1_theta", 1:TT), paste0("Class2_theta", 1:TT) )
dimnames(E)[[4]] <- c( paste0("b_Class1_", colnames(dat)),
       paste0("b_Class2_", colnames(dat)), "One")
for (ii in 1:I){
  # Class 1 item parameters
    E[ ii, 2, 1:TT, ii ] <- -1   # '-b' in '1*theta - b'
    E[ ii, 2, 1:TT, 2*I+1] <- theta.k1  # '1*theta' in '1*theta - b'
  # Class 2 item parameters
    E[ ii, 2, TT + 1:TT, I + ii ] <- -1
    E[ ii, 2, TT + 1:TT, 2*I+1] <- theta.k1
                }
# initial gammaslope parameters
par1 <- qlogis( colMeans( dat ) )
gammaslope <- c( par1, par1 + stats::runif(I, -2,2 ), 1 )
# sum constraint of zero on item difficulties within a class
gammaslope.center.index <- c( rep( 1, I ), rep(2,I), 0 )
gammaslope.center.value <- c(0,0)

# estimate model
mod1 <- TAM::tam.mml.3pl(resp=dat, E=E, skillspace="discrete",
            theta.k=theta.k, delta.designmatrix=delta.designmatrix,
            gammaslope=gammaslope, gammaslope.center.index=gammaslope.center.index,
            gammaslope.center.value=gammaslope.center.value, gammaslope.fixed=gammaslope.fixed,
            notA=TRUE)
summary(mod1)
# latent class proportions
stats::aggregate( mod1$pi.k, list( rep(1:2, each=TT)), sum )
# compare simulated and estimated item parameters
cbind( b1, b2, - mod1$gammaslope[1:I], - mod1$gammaslope[I + 1:I ] )

#--- specification in tamaan
tammodel <- "
ANALYSIS:
  TYPE=MIXTURE;
  NCLASSES(2)
  NSTARTS(5,30)
LAVAAN MODEL:
  F=~ I0001__I0015
ITEM TYPE:
  ALL(Rasch);
    "
mod1b <- TAM::tamaan( tammodel, resp=dat )
summary(mod1b)

#############################################################################
# EXAMPLE 8: 2PL mixture distribution model
#############################################################################

#*** simulate data
set.seed(9187)
library(sirt)
# simulate two latent classes of Rasch populations
I <- 20
b1 <- seq( -1.5, 1.5, len=I)      # difficulties latent class 1
b2 <- b1    # difficulties latent class 2
b2[ c(4,7, 9, 11, 12, 13, 16, 18) ] <- c(1, -.5, -.5, .33, .33, -.66, -1, .3)
# b2 <- scale( b2, scale=FALSE)
b2 <- b2 - mean(b2)
N <- 4000       # number of persons
wgt <- .75       # class probability for class 1
# item slopes
a1 <- rep( 1, I )  # first class
a2 <- rep( c(.5,1.5), I/2 )

# class 1
dat1 <- sirt::sim.raschtype( stats::rnorm( wgt*N ), - b1, fixed.a=a1)
# class 2
dat2 <- sirt::sim.raschtype( stats::rnorm( (1-wgt)*N, mean=1, sd=1.4), - b2, fixed.a=a2)
dat <- rbind( dat1, dat2 )

#*** Model 1: Mixed 2PL model with two latent classes

theta.k1 <- seq( -4, 4, len=7 )
TT <- length(theta.k1)
#-- define theta design matrix
theta.k <- diag(2*TT)   # 2*7=14 classes
#-- delta designmatrix
delta.designmatrix <- matrix( 0, 2*TT, ncol=6 )
# Class 1
delta.designmatrix[1:TT, 1] <- 1
delta.designmatrix[1:TT, 2:3] <- cbind( theta.k1, theta.k1^2 )
# Class 2
delta.designmatrix[TT+1:TT, 4] <- 1
delta.designmatrix[TT+1:TT, 5:6] <- cbind( theta.k1, theta.k1^2 )

#-- define loading matrix E
E <- array( 0, dim=c(I,2,2*TT,4*I ) )
dimnames(E)[[1]] <- colnames(dat)
dimnames(E)[[2]] <- c("Cat0","Cat1")
dimnames(E)[[3]] <- c( paste0("Class1_theta", 1:TT), paste0("Class2_theta", 1:TT) )
dimnames(E)[[4]] <- c( paste0("b_Class1_", colnames(dat)),
                       paste0("a_Class1_", colnames(dat)),
                       paste0("b_Class2_", colnames(dat)),
                       paste0("a_Class2_", colnames(dat)) )

for (ii in 1:I){
  # Class 1 item parameters
    E[ ii, 2, 1:TT, ii ] <- -1           # '-b' in 'a*theta - b'
    E[ ii, 2, 1:TT, I + ii] <- theta.k1  # 'a*theta' in 'a*theta - b'
  # Class 2 item parameters
    E[ ii, 2, TT + 1:TT, 2*I + ii ] <- -1
    E[ ii, 2, TT + 1:TT, 3*I + ii ] <- theta.k1
}

# initial gammaslope parameters
par1 <- scale( - stats::qlogis( colMeans( dat ) ), scale=FALSE )
gammaslope <- c( par1, rep(1,I),  scale( par1 + runif(I, - 1.4, 1.4 ),
        scale=FALSE), stats::runif( I,.6,1.4) )

# constraint matrix
gammaslope.constr.V <- matrix( 0, 4*I, 4 )
# sum of item intercepts equals zero
gammaslope.constr.V[ 1:I, 1] <- 1        # Class 1 (b)
gammaslope.constr.V[ 2*I + 1:I, 2] <- 1  # Class 2 (b)
# sum of item slopes equals number of items -> mean slope of 1
gammaslope.constr.V[ I + 1:I, 3] <- 1    # Class 1 (a)
gammaslope.constr.V[ 3*I + 1:I, 4] <- 1  # Class 2 (a)
gammaslope.constr.c <- c(0,0,I,I)

# estimate model
mod1 <- TAM::tam.mml.3pl(resp=dat, E=E, control=list(maxiter=80), skillspace="discrete",
      theta.k=theta.k, delta.designmatrix=delta.designmatrix,
      gammaslope=gammaslope, gammaslope.constr.V=gammaslope.constr.V,
      gammaslope.constr.c=gammaslope.constr.c,  gammaslope.fixed=gammaslope.fixed,
      notA=TRUE)

# estimated item parameters
mod1$gammaslope
# summary
summary(mod1)
# latent class proportions
round( stats::aggregate( mod1$pi.k, list( rep(1:2, each=TT)), sum ), 3 )
# compare simulated and estimated item intercepts
int <- cbind( b1*a1, b2 * a2, - mod1$gammaslope[1:I], - mod1$gammaslope[2*I + 1:I ] )
round( int, 3 )
# simulated and estimated item slopes
slo <- cbind( a1, a2,  mod1$gammaslope[I+1:I], mod1$gammaslope[3*I + 1:I ] )
round(slo,3)

#--- specification in tamaan
tammodel <- "
ANALYSIS:
  TYPE=MIXTURE;
  NCLASSES(2)
  NSTARTS(10,25)
LAVAAN MODEL:
  F=~ I0001__I0020
    "
mod1t <- TAM::tamaan( tammodel, resp=dat )
summary(mod1t)

#############################################################################
# EXAMPLE 9: Toy example: Exact representation of an item by a factor
#############################################################################

data(data.gpcm)
dat <- data.gpcm[,1,drop=FALSE ]   # choose first item
# some descriptives
( t1 <- table(dat) )

# The idea is that we define an IRT model with one latent variable
# which extactly corresponds to the manifest item.

I <- 1    # 1 item
K <- 4    # 4 categories
TP <- 4   # 4 discrete theta points

# define skill space
theta.k <- diag(TP)
# define loading matrix E
E <- array( -99, dim=c(I,K,TP,1 ) )
for (vv in 1:K){
    E[ 1, vv, vv, 1 ] <- 9
                }
# estimate model
mod1 <- TAM::tam.mml.3pl(resp=dat, E=E, skillspace="discrete",
         theta.k=theta.k,  notA=TRUE)
summary(mod1)
# -> the latent distribution corresponds to the manifest distribution, because ...
round( mod1$pi.k, 3 )
round( t1 / sum(t1), 3 )

#############################################################################
# EXAMPLE 10: Some fixed item loadings
#############################################################################

data(data.Students,package="CDM")
dat <- data.Students
# select variables
vars <- scan( nlines=1, what="character")
    act1 act2 act3 act4 act5 sc1 sc2 sc3 sc4
dat <- data.Students[, vars  ]

# define loading matrix: two-dimensional model
Q <- matrix( 0, nrow=9, ncol=2 )
Q[1:5,1] <- 1
Q[6:9,2] <- 1
# define some fixed item loadings
Q.fixed <- NA*Q
Q.fixed[ c(1,4), 1] <- .5
Q.fixed[ 6:7, 2 ] <- 1

# estimate model
mod3 <- TAM::tam.mml.3pl( resp=dat, gammaslope.des="2PL", Q=Q, Q.fixed=Q.fixed,
            control=list( maxiter=10, nodes=seq(-4,4,len=10) )  )
summary(mod3)

#############################################################################
# EXAMPLE 11: Mixed response formats - Multiple choice and partial credit items
#############################################################################

data(data.timssAusTwn.scored)
dat <- data.timssAusTwn.scored

# select columns with item responses
dat <- dat[, grep("M0", colnames(dat) ) ]
I <- ncol(dat)   # number of items

# The idea is to start with partial credit modelling
# and then to include the guessing parameters

#*** Model 0: Partial Credit Model
mod0 <- TAM::tam.mml(dat)
summary(mod0)

#*** Model 1 and Model 2: include guessing parameters

# multiple choice items
guess_items <- which( apply( dat, 2, max, na.rm=TRUE )==1 )
# define guessing parameters
guess0 <- rep(0,I)
guess0[ guess_items ] <- .25  # set guessing probability to .25

# define which guessing parameters should be estimated
est.guess1 <- rep(0,I)  # all parameters are fixed
est.guess2 <- 1 * ( guess0==.25 )  # joint guessing parameter

# use design matrix from partial credit model
A0 <- mod0$A

#--- Model 1: fixed guessing parameters of .25 and item slopes of 1
mod1 <- TAM::tam.mml.3pl( dat, guess=guess0, est.guess=est.guess1,
            A=A0, est.some.slopes=FALSE, control=list(maxiter=50) )
summary(mod1)

#--- Model 2: estimate joint guessing parameters and item slopes of 1
mod2 <- TAM::tam.mml.3pl( dat, guess=guess0, est.guess=est.guess2,
            A=A0, est.some.slopes=FALSE, control=list(maxiter=50) )
summary(mod2)

# model comparison
IRT.compareModels(mod0,mod1,mod2)

## End(Not run)

TAM

Test Analysis Modules

v3.6-45

GPL (>= 2)

Authors

Alexander Robitzsch [aut,cre] (<https://orcid.org/0000-0002-8226-3132>), Thomas Kiefer [aut], Margaret Wu [aut]

Initial release

2021-04-22 14:35:52