Conversion between sequence formats
Convert a sequence data set from one format to another.
seqformat(data, var = NULL, from, to, compress = FALSE, nrep = NULL, tevent, stsep = NULL, covar = NULL, SPS.in = list(xfix = "()", sdsep = ","), SPS.out = list(xfix = "()", sdsep = ","), id = 1, begin = 2, end = 3, status = 4, process = TRUE, pdata = NULL, pvar = NULL, limit = 100, overwrite = TRUE, fillblanks = NULL, tmin = NULL, tmax = NULL, missing = "*", with.missing = TRUE, right="DEL", compressed, nr)
data |
Data Frame, Matrix, or State Sequence Object. The data to use. A data frame or a matrix with sequence data in one or more columns when
A data frame with sequence data in one or more columns when A state sequence object when |
var |
|
from |
String.
The format of the input sequence data.
It can be |
to |
String.
The format of the output data.
It can be |
compress |
Logical.
Default: |
nrep |
Integer.
The number of shifted replications when |
tevent |
Matrix.
The transition-definition matrix when |
stsep |
|
covar |
List of Integers or Strings.
The indexes or the names of additional columns in |
SPS.in |
List.
Default: |
SPS.out |
List.
Default: |
id |
When When When |
begin |
Integer or String.
Default: |
end |
Integer or String.
Default: |
status |
Integer or String.
Default: |
process |
Logical.
Default: This |
pdata |
If If A data frame containing the ID and the birth time of the individuals when
|
pvar |
List of Integers or Strings.
The indexes or names of the columns of the data frame |
limit |
Integer.
Default: |
overwrite |
Logical.
Default: |
fillblanks |
Character.
The value to fill gaps between episodes when |
tmin |
|
tmax |
|
missing |
String.
Default: |
with.missing |
Logical.
Default: |
right |
One of |
compressed |
Deprecated. Use |
nr |
Deprecated. Use |
The seqformat
function is used to convert data from one format to
another. The input data is first converted into the STS format and then
converted to the output format. Depending on input and output formats, some
information can be lost in the conversion process. The output is a matrix or
a data frame, NOT a sequence stslist
object. To process, print or plot
the sequences with TraMineR functions, you will have to first transform the data frame
into a stslist
state sequence object with seqdef
.
See Gabadinho et al. (2009) and Ritschard et al. (2009) for more
details on longitudinal data formats and converting between them.
When data are in "SPELL"
format (from = "SPELL"
), the begin and end times are expected to be positions in the sequences. Therefore they should be strictly positive integers.
With process=TRUE
, the outcome sequences will be aligned on ages (process duration since birth), while with process=FALSE
they will be aligned on dates (position on the calendar time). If TRUE
, values in the begin
and end
columns of data
are assumed to be integer dates when pdata
is not NULL
, and ages otherwise. If FALSE
, begin and end values are assumed to be dates when pdata
is NULL
and ages otherwise.
A data frame for SRS
, TSE
, and SPELL
, a matrix otherwise.
Alexis Gabadinho, Pierre-Alexandre Fonta, Nicolas S. Müller, Matthias Studer, and Gilbert Ritschard.
Gabadinho, A., G. Ritschard, M. Studer and N. S. Müller (2009). Mining
Sequence Data in R
with the TraMineR
package: A user's guide.
Department of Econometrics and Laboratory of Demography, University of Geneva.
Ritschard, G., A. Gabadinho, M. Studer and N. S. Müller. Converting between various sequence representations. in Ras, Z. & Dardzinska, A. (eds.) Advances in Data Management, Springer, 2009, 223, 155-175.
## ======================================== ## Examples with raw STS sequences as input ## ======================================== ## Loading a data frame with sequence data in the columns 13 to 24 data(actcal) ## Converting to SPS format actcal.SPS.A <- seqformat(actcal, 13:24, from = "STS", to = "SPS") head(actcal.SPS.A) ## Converting to compressed SPS format with no ## prefix/suffix and with "/" as state/duration separator actcal.SPS.B <- seqformat(actcal, 13:24, from = "STS", to = "SPS", compress = TRUE, SPS.out = list(xfix = "", sdsep = "/")) head(actcal.SPS.B) ## Converting to compressed DSS format actcal.DSS <- seqformat(actcal, 13:24, from = "STS", to = "DSS", compress = TRUE) head(actcal.DSS) ## ============================================== ## Examples with a state sequence object as input ## ============================================== ## Loading a data frame with sequence data in the columns 10 to 25 data(biofam) ## Limiting the number of considered cases to the first 20 biofam <- biofam[1:20, ] ## Creating a state sequence object biofam.labs <- c("Parent", "Left", "Married", "Left/Married", "Child", "Left/Child", "Left/Married/Child", "Divorced") biofam.short.labs <- c("P", "L", "M", "LM", "C", "LC", "LMC", "D") biofam.seq <- seqdef(biofam, 10:25, alphabet = 0:7, states = biofam.short.labs, labels = biofam.labs) ## Converting to SPELL format bf.spell <- seqformat(biofam.seq, from = "STS", to = "SPELL", pdata = biofam, pvar = c("idhous", "birthyr")) head(bf.spell) ## ====================================== ## Examples with SPELL sequences as input ## ====================================== ## Loading two data frames: bfspell20 and bfpdata20 ## bfspell20 contains the first 20 biofam sequences in SPELL format ## bfpdata20 contains the IDs and the years at which the ## considered individuals were aged 15 data(bfspell) ## Converting to STS format with alignement on calendar years bf.sts.y <- seqformat(bfspell20, from = "SPELL", to = "STS", id = "id", begin = "begin", end = "end", status = "states", process = FALSE) head(bf.sts.y) ## Converting to STS format with alignement on ages bf.sts.a <- seqformat(bfspell20, from = "SPELL", to = "STS", id = "id", begin = "begin", end = "end", status = "states", process = TRUE, pdata = bfpdata20, pvar = c("id", "when15"), limit = 16) names(bf.sts.a) <- paste0("a", 15:30) head(bf.sts.a) ## ================================== ## Examples for TSE and SPELL output ## in presence of missing values ## ================================== data(ex1) ## STS data with missing values ## creating the state sequence object with by default ## the end missings coded as void ('%') sqex1 <- seqdef(ex1[,1:13]) as.matrix(sqex1) ## Creating state-event transition matrices ttrans <- seqetm(sqex1, method='transition') tstate <- seqetm(sqex1, method='state') ## Converting into time stamped events seqformat(sqex1, from = "STS", to = "TSE", tevent = ttrans) seqformat(sqex1, from = "STS", to = "TSE", tevent = tstate) ## Converting into vertical spell data seqformat(sqex1, from = "STS", to = "SPELL", with.missing=TRUE) seqformat(sqex1, from = "STS", to = "SPELL", with.missing=TRUE, right=NA) seqformat(sqex1, from = "STS", to = "SPELL", with.missing=FALSE)
Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.