seqHMM: build_mhmm – R documentation

Pricing

Become an expert in R — Interactive courses, Cheat Sheets, certificates and more!

Get Started for Free

Documentation

seqHMM

build_mhmm

Build a Mixture Hidden Markov Model

Description

Function build_mhmm constructs a mixture hidden Markov model object of class mhmm.

Usage

build_mhmm(observations, n_states, transition_probs, emission_probs,
  initial_probs, formula, data, coefficients, cluster_names = NULL,
  state_names = NULL, channel_names = NULL, ...)

Arguments

`observations`	An `stslist` object (see `seqdef`) containing the sequences, or a list of such objects (one for each channel).
`n_states`	A numerical vector giving the number of hidden states in each submodel (not used if starting values for model parameters are given with `initial_probs`, `transition_probs`, or `emission_probs`).
`transition_probs`	A list of matrices of transition probabilities for the submodel of each cluster.
`emission_probs`	A list which contains matrices of emission probabilities or a list of such objects (one for each channel) for the submodel of each cluster. Note that the matrices must have dimensions m x s where m is the number of hidden states and s is the number of unique symbols (observed states) in the data. Emission probabilities should follow the ordering of the alphabet of observations (`alphabet(observations)`, returned as `symbol_names`).
`initial_probs`	A list which contains vectors of initial state probabilities for the submodel of each cluster.
`formula`	Covariates as an object of class `formula`, left side omitted.
`data`	An optional data frame, list or environment containing the variables in the model. If not found in data, the variables are taken from `environment(formula)`.
`coefficients`	An optional k x l matrix of regression coefficients for time-constant covariates for mixture probabilities, where l is the number of clusters and k is the number of covariates. A logit-link is used for mixture probabilities. The first column is set to zero.
`cluster_names`	A vector of optional names for the clusters.
`state_names`	A list of optional labels for the hidden states. If `NULL`, the state names are taken as row names of transition matrices. If this is also `NULL`, numbered states are used.
`channel_names`	A vector of optional names for the channels.
`...`	Additional arguments to `simulate_transition_probs`.

Details

The returned model contains some attributes such as nobs and df, which define the number of observations in the model and the number of estimable model parameters, used in computing BIC. When computing nobs for a multichannel model with C channels, each observed value in a single channel amounts to 1/C observation, i.e. a fully observed time point for a single sequence amounts to one observation. For the degrees of freedom df, zero probabilities of the initial model are defined as structural zeroes.

Value

Object of class mhmm with following elements:

observations: State sequence object or a list of such containing the data.
transition_probs: A matrix of transition probabilities.
emission_probs: A matrix or a list of matrices of emission probabilities.
initial_probs: A vector of initial probabilities.
coefficients: A matrix of parameter coefficients for covariates (covariates in rows, clusters in columns).
X: Covariate values for each subject.
cluster_names: Names for clusters.
state_names: Names for hidden states.
symbol_names: Names for observed states.
channel_names: Names for channels of sequence data
length_of_sequences: (Maximum) length of sequences.
n_sequences: Number of sequences.
n_symbols: Number of observed states (in each channel).
n_states: Number of hidden states.
n_channels: Number of channels.
n_covariates: Number of covariates.
n_clusters: Number of clusters.

References

Helske S. and Helske J. (2019). Mixture Hidden Markov Models for Sequence Data: The seqHMM Package in R, Journal of Statistical Software, 88(3), 1-32. doi:10.18637/jss.v088.i03

Examples

data("biofam3c")

## Building sequence objects
marr_seq <- seqdef(biofam3c$married, start = 15,
  alphabet = c("single", "married", "divorced"))
child_seq <- seqdef(biofam3c$children, start = 15,
  alphabet = c("childless", "children"))
left_seq <- seqdef(biofam3c$left, start = 15,
  alphabet = c("with parents", "left home"))

## Choosing colors
attr(marr_seq, "cpal") <- c("#AB82FF", "#E6AB02", "#E7298A")
attr(child_seq, "cpal") <- c("#66C2A5", "#FC8D62")
attr(left_seq, "cpal") <- c("#A6CEE3", "#E31A1C")

## MHMM with random starting values, no covariates
set.seed(468)
init_mhmm_bf1 <- build_mhmm(
  observations = list(marr_seq, child_seq, left_seq),
  n_states = c(4, 4, 6),
  channel_names = c("Marriage", "Parenthood", "Residence"))
  
  
## Starting values for emission probabilities

# Cluster 1
B1_marr <- matrix(
  c(0.8, 0.1, 0.1, # High probability for single
    0.8, 0.1, 0.1,
    0.3, 0.6, 0.1, # High probability for married
    0.3, 0.3, 0.4), # High probability for divorced
  nrow = 4, ncol = 3, byrow = TRUE)

B1_child <- matrix(
  c(0.9, 0.1, # High probability for childless
    0.9, 0.1,
    0.9, 0.1,
    0.9, 0.1),
  nrow = 4, ncol = 2, byrow = TRUE)

B1_left <- matrix(
  c(0.9, 0.1, # High probability for living with parents
    0.1, 0.9, # High probability for having left home
    0.1, 0.9,
    0.1, 0.9),
  nrow = 4, ncol = 2, byrow = TRUE)

# Cluster 2

B2_marr <- matrix(
  c(0.8, 0.1, 0.1, # High probability for single
    0.8, 0.1, 0.1,
    0.1, 0.8, 0.1, # High probability for married
    0.7, 0.2, 0.1),
  nrow = 4, ncol = 3, byrow = TRUE)

B2_child <- matrix(
  c(0.9, 0.1, # High probability for childless
    0.9, 0.1,
    0.9, 0.1,
    0.1, 0.9),
  nrow = 4, ncol = 2, byrow = TRUE)

B2_left <- matrix(
  c(0.9, 0.1, # High probability for living with parents
    0.1, 0.9,
    0.1, 0.9,
    0.1, 0.9),
  nrow = 4, ncol = 2, byrow = TRUE)

# Cluster 3
B3_marr <- matrix(
  c(0.8, 0.1, 0.1, # High probability for single
    0.8, 0.1, 0.1,
    0.8, 0.1, 0.1,
    0.1, 0.8, 0.1, # High probability for married
    0.3, 0.4, 0.3,
    0.1, 0.1, 0.8), # High probability for divorced
  nrow = 6, ncol = 3, byrow = TRUE)

B3_child <- matrix(
  c(0.9, 0.1, # High probability for childless
    0.9, 0.1,
    0.5, 0.5,
    0.5, 0.5,
    0.5, 0.5,
    0.1, 0.9),
  nrow = 6, ncol = 2, byrow = TRUE)


B3_left <- matrix(
  c(0.9, 0.1, # High probability for living with parents
    0.1, 0.9,
    0.5, 0.5,
    0.5, 0.5,
    0.1, 0.9,
    0.1, 0.9),
  nrow = 6, ncol = 2, byrow = TRUE)

# Starting values for transition matrices
A1 <- matrix(
  c(0.80, 0.16, 0.03, 0.01,
    0,    0.90, 0.07, 0.03,
    0,    0,    0.90, 0.10,
    0,    0,    0,       1),
  nrow = 4, ncol = 4, byrow = TRUE)

A2 <- matrix(
  c(0.80, 0.10, 0.05, 0.03, 0.01, 0.01,
    0,    0.70, 0.10, 0.10, 0.05, 0.05,
    0,    0,    0.85, 0.01, 0.10, 0.04,
    0,    0,    0,    0.90, 0.05, 0.05,
    0,    0,    0,    0,    0.90, 0.10,
    0,    0,    0,    0,    0,       1),
  nrow = 6, ncol = 6, byrow = TRUE)

# Starting values for initial state probabilities
initial_probs1 <- c(0.9, 0.07, 0.02, 0.01)
initial_probs2 <- c(0.9, 0.04, 0.03, 0.01, 0.01, 0.01)

# Birth cohort
biofam3c$covariates$cohort <- cut(biofam3c$covariates$birthyr, c(1908, 1935, 1945, 1957))
biofam3c$covariates$cohort <- factor(
  biofam3c$covariates$cohort, labels=c("1909-1935", "1936-1945", "1946-1957"))

## MHMM with own starting values and covariates
init_mhmm_bf2 <- build_mhmm(
  observations = list(marr_seq, child_seq, left_seq),
  initial_probs = list(initial_probs1, initial_probs1, initial_probs2),
  transition_probs = list(A1, A1, A2),
  emission_probs = list(list(B1_marr, B1_child, B1_left),
    list(B2_marr, B2_child, B2_left),
    list(B3_marr, B3_child, B3_left)),
  formula = ~sex + cohort, data = biofam3c$covariates,
  cluster_names = c("Cluster 1", "Cluster 2", "Cluster 3"),
  channel_names = c("Marriage", "Parenthood", "Residence"),
  state_names = list(paste("State", 1:4), paste("State", 1:4),
                     paste("State", 1:6)))

seqHMM

Mixture Hidden Markov Models for Social Sequence Data and Other Multivariate, Multichannel Categorical Time Series

v1.0.14

GPL (>= 2)

Authors

Jouni Helske [aut, cre] (<https://orcid.org/0000-0001-7130-793X>), Satu Helske [aut] (<https://orcid.org/0000-0003-0532-0153>)

Initial release

2019-10-21

build_mhmm

Description

Usage

Arguments

Details

Value

References

See Also

Examples

seqHMM

We don't support your browser anymore