Logarithm of the probabilities of state sequences
Compute the logarithm of the probability of each state sequence obtained from a state transition model. The probability of a sequence is equal to the product of each state probability of the sequence. There are several methods to compute a state probability.
seqlogp(seqdata, prob="trate", time.varying=TRUE, begin="freq", weighted=TRUE)
seqdata |
The sequence to compute the probabilities. |
prob |
either the name ( |
time.varying |
Logical. If |
begin |
Model used to compute the probability of the first state. Either |
weighted |
Logical. If |
The sequence likelihood P(s) is defined as the product of the probability with which each of its observed successive state is supposed to occur at its position. Let s=s_1s_2 ... s_l be a sequence of length l. Then
P(s)=P(s_1, 1) * P(s_2, 2) * ... * P(s_l, l)
with P(s_t,t) the probability to observe state s_t at position t.
The question is how to determinate the state probabilities P(s_t,t). Several methods are available and can be set using the prob
argument.
One commonly used method for computing them is to postulate a Markov model, which can be of various order. We can consider probabilities derived from the first order Markov model, that is, each P(s_t,t), t>1 is set as the transition rate p(s_t|s_(t-1)). This is available in seqlogp
by setting prob="trate"
.
The transition rates may be considered constant over time/positions (time.varying=FALSE
), that is estimated across sequences from the observations at positions t and t-1 for all t together. Time varying transition rates may also be considered (time.varying=TRUE
), in which case they are computed separately for each position, that is estimated across sequences from the observations at positions t and t-1 for each t, yielding an array of transition matrices. The user may also specify his own transition rates array or matrix.
Another method is to use the frequency of a state at each position to set P(s_t,t) (prob="freq"
). In the latter case, the probability of a sequence is independent of the probability of the transitions. Here again, the frequencies can be computed all together (time.varying=FALSE
) or separately for each position t (time.varying=TRUE
).
For t=1, we set P(s_1,1) to the observed frequency of the state s_1 at position 1. Alternatively, the begin
argument allows to specify the probability of the first state.
The likelihood P(s) being generally very small, seqlogp
return -log(P(s)). The latter quantity is minimal when P(s) is equal to 1.
A vector containing the logarithm of each sequence probability.
Matthias Studer and Alexis Gabadinho (with Gilbert Ritschard for the help page)
## Creating the sequence objects using weigths data(biofam) biofam.seq <- seqdef(biofam, 10:25, weights=biofam$wp00tbgs) ## Computing sequence probabilities biofam.prob <- seqlogp(biofam.seq) ## Comparing the probability of each cohort cohort <- biofam$birthyr>1940 boxplot(biofam.prob~cohort)
Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.