Generate Group-Id from Integer Sequences
seqid
can be used to group sequences of integers in a vector, e.g. seqid(c(1:3, 5:7))
becomes c(rep(1,3), rep(2,3))
. It also supports increments > 1
, unordered sequences, and missing values in the sequence.
Some applications are to facilitate identification of, and grouped operations on, (irregular) time series and panels.
seqid(x, o = NULL, del = 1L, start = 1L, na.skip = FALSE, skip.seq = FALSE, check.o = TRUE)
x |
a factor or integer vector. Numeric vectors will be converted to integer i.e. rounded downwards. |
o |
an (optional) integer ordering vector specifying the order by which to pass through |
del |
integer. The integer deliminating two consecutive points in a sequence. |
start |
integer. The starting value of the resulting sequence id. Default is starting from 1. For C++ programmers, starting from 0 could be a better choice. |
na.skip |
logical. Skip missing values in the sequence. The default behavior is skipping such that |
skip.seq |
logical. If |
check.o |
logical. Programmers option: |
seqid
was created primarily as a workaround to deal with problems of computing lagged values, differences and growth rates on irregularly spaced time series and panels before collapse version 1.5.0 (#26). Now flag
, fdiff
and fgrowth
natively support irregular data so this workaround is superfluous, except for iterated differencing which is not yet supported with irregular data.
The theory of the workaround was to express an irregular time series or panel series as a regular panel series with a group-id created such that the time-periods within each group are consecutive. seqid
makes this very easy: For an irregular panel with some gaps or repeated values in the time variable, an appropriate id variable can be generated using settransform(data, newid = seqid(time, radixorder(id, time)))
. Lags can then be computed using L(data, 1, ~newid, ~time)
etc.
In general, for any regularly spaced panel the identity given by identical(groupid(id, order(id, time)), seqid(time, order(id, time)))
should hold.
Regularly spaced panels with gaps in time (such as a panel-survey with measurements every 2 years) can be handled either by seqid(..., del = gap)
or, in most cases, simply by converting the time variable to factor using qF
, which will make observations consecutive.
There are potentially other more analytical applications for seqid
...
For the opposite operation of creating a new time-variable that is consecutive in each group, see data.table::rowid
.
An integer vector of class 'qG'. See qG
.
## This creates an irregularly spaced panel, with a gap in time for id = 2 data <- data.frame(id = rep(1:3, each = 4), time = c(1:4, 1:2, 4:5, 1:4), value = rnorm(12)) data ## This gave a gaps in time error previous to collapse 1.5.0 L(data, 1, value ~ id, ~time) ## Generating new id variable (here seqid(time) would suffice as data is sorted) settransform(data, newid = seqid(time, order(id, time))) data ## Lag the panel this way L(data, 1, value ~ newid, ~time) ## A different possibility: Creating a consecutive time variable settransform(data, newtime = data.table::rowid(id)) data L(data, 1, value ~ id, ~newtime) ## With sorted data, the time variable can also just be omitted.. L(data, 1, value ~ id)
Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.