Fast Lags and Leads for Time Series and Panel Data
flag
is an S3 generic to compute (sequences of) lags and leads. L
and F
are wrappers around flag
representing the lag- and lead-operators, such that L(x,-1) = F(x,1) = F(x)
and L(x,-3:3) = F(x,3:-3)
. L
and F
provide more flexibility than flag
when applied to data frames (i.e. column subsetting, formula input and id-variable-preservation capabilities...), but are otherwise identical.
(flag
is more of a programmers function in style of the Fast Statistical Functions while L
and F
are more practical to use in regression formulas or for computations on data frames.)
flag(x, n = 1, ...) L(x, n = 1, ...) F(x, n = 1, ...) ## Default S3 method: flag(x, n = 1, g = NULL, t = NULL, fill = NA, stubs = TRUE, ...) ## Default S3 method: L(x, n = 1, g = NULL, t = NULL, fill = NA, stubs = TRUE, ...) ## Default S3 method: F(x, n = 1, g = NULL, t = NULL, fill = NA, stubs = TRUE, ...) ## S3 method for class 'matrix' flag(x, n = 1, g = NULL, t = NULL, fill = NA, stubs = length(n) > 1L, ...) ## S3 method for class 'matrix' L(x, n = 1, g = NULL, t = NULL, fill = NA, stubs = TRUE, ...) ## S3 method for class 'matrix' F(x, n = 1, g = NULL, t = NULL, fill = NA, stubs = TRUE, ...) ## S3 method for class 'data.frame' flag(x, n = 1, g = NULL, t = NULL, fill = NA, stubs = length(n) > 1L, ...) ## S3 method for class 'data.frame' L(x, n = 1, by = NULL, t = NULL, cols = is.numeric, fill = NA, stubs = TRUE, keep.ids = TRUE, ...) ## S3 method for class 'data.frame' F(x, n = 1, by = NULL, t = NULL, cols = is.numeric, fill = NA, stubs = TRUE, keep.ids = TRUE, ...) # Methods for compatibility with plm: ## S3 method for class 'pseries' flag(x, n = 1, fill = NA, stubs = TRUE, ...) ## S3 method for class 'pseries' L(x, n = 1, fill = NA, stubs = TRUE, ...) ## S3 method for class 'pseries' F(x, n = 1, fill = NA, stubs = TRUE, ...) ## S3 method for class 'pdata.frame' flag(x, n = 1, fill = NA, stubs = length(n) > 1L, ...) ## S3 method for class 'pdata.frame' L(x, n = 1, cols = is.numeric, fill = NA, stubs = TRUE, keep.ids = TRUE, ...) ## S3 method for class 'pdata.frame' F(x, n = 1, cols = is.numeric, fill = NA, stubs = TRUE, keep.ids = TRUE, ...) # Methods for grouped data frame / compatibility with dplyr: ## S3 method for class 'grouped_df' flag(x, n = 1, t = NULL, fill = NA, stubs = length(n) > 1L, keep.ids = TRUE, ...) ## S3 method for class 'grouped_df' L(x, n = 1, t = NULL, fill = NA, stubs = TRUE, keep.ids = TRUE, ...) ## S3 method for class 'grouped_df' F(x, n = 1, t = NULL, fill = NA, stubs = TRUE, keep.ids = TRUE, ...)
x |
a vector / time series, (time series) matrix, data frame, panel series ( |
n |
integer. A vector indicating the lags / leads to compute (passing negative integers to |
g |
a factor, |
by |
data.frame method: Same as |
t |
same input as |
cols |
data.frame method: Select columns to difference using a function, column names, indices or a logical vector. Default: All numeric variables. Note: |
fill |
value to insert when vectors are shifted. Default is |
stubs |
logical. |
keep.ids |
data.frame / pdata.frame / grouped_df methods: Logical. Drop all panel-identifiers from the output (which includes all variables passed to |
... |
arguments to be passed to or from other methods. |
If a single integer is passed to n
, and g/by
and t
are left empty, flag/L/F
just returns x
with all columns lagged / leaded by n
. If length(n)>1
, and x
is an atomic vector (time series), flag/L/F
returns a (time series) matrix with lags / leads computed in the same order as passed to n
. If instead x
is a matrix / data frame, a matrix / data frame with ncol(x)*length(n)
columns is returned where columns are sorted first by variable and then by lag (so all lags computed on a variable are grouped together). x
can be of any standard data type.
With groups/panel-identifiers supplied to g/by
, flag/L/F
efficiently computes a panel-lag/lead by shifting the entire vector(s) but inserting fill
elements in the right places. If t
is left empty, the data needs to be ordered such that all values belonging to a group are consecutive and in the right order. It is not necessary that the groups themselves occur in the right order. If a time-variable is supplied to t
(or a list of time-variables uniquely identifying the time-dimension), the panel is fully identified and lags / leads can be securely computed even if the data is unordered.
It is also possible to lag unordered or irregular time series utilizing only the t
argument to identify the temporal dimension of the data.
Since v1.5.0 flag/L/F
provide full built-in support for irregular time series and unbalanced panels. The suggested workaround using the seqid
function is therefore no longer necessary.
Computationally, if both g/by
and t
are supplied, flag/L/F
uses two initial passes to create an ordering through which the data are accessed. First-pass: Calculate minimum and maximum time-value for each individual. Second-pass: Generate the ordering by placing the current element index into the vector slot obtained by adding the cumulative group size and the current time-value subtracted its individual-minimum together. This method of computation is faster than any sort-based method and delivers optimal performance if the panel-id supplied to g/by
is already a factor variable, and if t
is either an integer or factor variable. If g/by
is not factor or t
is not factor or integer, qG
or GRP
will be called to group the respective identifier and this can be expensive, so for optimal performance prepare the data (or use plm classes).
The methods applying to plm objects (panel series and panel data frames) automatically utilize the factor panel-identifiers attached to these objects and thus securely and efficiently compute fully identified panel-lags. If these objects have > 2 panel-identifiers attached to them, the last identifier is assumed to be the time-variable, and the others are taken as grouping-variables and interacted. Note that flag/L/F
is significantly faster than plm::lag/plm::lead
since the latter is written in R and based on a Split-Apply-Combine logic.
x
lagged / leaded n
-times, grouped by g/by
, ordered by t
. See Details and Examples.
## Simple Time Series: AirPassengers L(AirPassengers) # 1 lag F(AirPassengers) # 1 lead all_identical(L(AirPassengers), # 3 identical ways of computing 1 lag flag(AirPassengers), F(AirPassengers, -1)) head(L(AirPassengers, -1:3)) # 1 lead and 3 lags - output as matrix ## Time Series Matrix of 4 EU Stock Market Indicators, 1991-1998 tsp(EuStockMarkets) # Data is recorded on 260 days per year freq <- frequency(EuStockMarkets) plot(stl(EuStockMarkets[,"DAX"], freq)) # There is some obvious seasonality head(L(EuStockMarkets, -1:3 * freq)) # 1 annual lead and 3 annual lags summary(lm(DAX ~., data = L(EuStockMarkets,-1:3*freq))) # DAX regressed on it's own annual lead, # lags and the lead/lags of the other series ## World Development Panel Data head(flag(wlddev, 1, wlddev$iso3c, wlddev$year)) # This lags all variables, head(L(wlddev, 1, ~iso3c, ~year)) # This lags all numeric variables head(L(wlddev, 1, ~iso3c)) # Without t: Works because data is ordered head(L(wlddev, 1, PCGDP + LIFEEX ~ iso3c, ~year)) # This lags GDP per Capita & Life Expectancy head(L(wlddev, 0:2, ~ iso3c, ~year, cols = 9:10)) # Same, also retaining original series head(L(wlddev, 1:2, PCGDP + LIFEEX ~ iso3c, ~year, # Two lags, dropping id columns keep.ids = FALSE)) # Different ways of regressing GDP on its's lags and life-Expectancy and it's lags summary(lm(PCGDP ~ ., L(wlddev, 0:2, ~iso3c, ~year, 9:10, keep.ids = FALSE))) # 1 - Precomputing summary(lm(PCGDP ~ L(PCGDP,1:2,iso3c,year) + L(LIFEEX,0:2,iso3c,year), wlddev)) # 2 - Ad-hoc summary(lm(PCGDP ~ L(PCGDP,1:2,iso3c) + L(LIFEEX,0:2,iso3c), wlddev)) # 3 - same no year g = qF(wlddev$iso3c); t = qF(wlddev$year) # 4- Precomputing summary(lm(PCGDP ~ L(PCGDP,1:2,g,t) + L(LIFEEX,0:2,g,t), wlddev)) # panel-id's ## Using plm: pwlddev <- plm::pdata.frame(wlddev, index = c("iso3c","year")) head(L(pwlddev, 0:2, 9:10)) # Again 2 lags of GDP and LIFEEX PCGDP <- pwlddev$PCGDP # A panel-Series of GDP per Capita head(L(PCGDP)) # Lagging the panel series summary(lm(PCGDP ~ ., L(pwlddev, 0:2, 9:10, keep.ids = FALSE))) # Running the lm again # THIS DOES NOT WORK: -> a pseries is only created when subsetting the pdata.frame using $ or [[ summary(lm(PCGDP ~ L(PCGDP,1:2) + L(LIFEEX,0:2), pwlddev)) # ..so L.default is used here.. LIFEEX <- pwlddev$LIFEEX # To make it work, create pseries summary(lm(PCGDP ~ L(PCGDP,1:2) + L(LIFEEX,0:2))) # THIS WORKS ! ## Using dplyr: library(dplyr) wlddev %>% group_by(iso3c) %>% select(PCGDP,LIFEEX) %>% L(0:2) wlddev %>% group_by(iso3c) %>% select(year,PCGDP,LIFEEX) %>% L(0:2,year) # Also using t (safer)
Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.