Fast (Quasi-, Log-) Differences for Time Series and Panel Data
fdiff
is a S3 generic to compute (sequences of) suitably lagged / leaded and iterated differences, quasi-differences, log-differences or quasi-log-differences. The difference and log-difference operators D
and Dlog
also exists as parsimonious wrappers around fdiff
, providing more flexibility than fdiff
when applied to data frames.
fdiff(x, n = 1, diff = 1, ...) D(x, n = 1, diff = 1, ...) Dlog(x, n = 1, diff = 1, ...) ## Default S3 method: fdiff(x, n = 1, diff = 1, g = NULL, t = NULL, fill = NA, log = FALSE, rho = 1, stubs = TRUE, ...) ## Default S3 method: D(x, n = 1, diff = 1, g = NULL, t = NULL, fill = NA, rho = 1, stubs = TRUE, ...) ## Default S3 method: Dlog(x, n = 1, diff = 1, g = NULL, t = NULL, fill = NA, rho = 1, stubs = TRUE, ...) ## S3 method for class 'matrix' fdiff(x, n = 1, diff = 1, g = NULL, t = NULL, fill = NA, log = FALSE, rho = 1, stubs = length(n) + length(diff) > 2L, ...) ## S3 method for class 'matrix' D(x, n = 1, diff = 1, g = NULL, t = NULL, fill = NA, rho = 1, stubs = TRUE, ...) ## S3 method for class 'matrix' Dlog(x, n = 1, diff = 1, g = NULL, t = NULL, fill = NA, rho = 1, stubs = TRUE, ...) ## S3 method for class 'data.frame' fdiff(x, n = 1, diff = 1, g = NULL, t = NULL, fill = NA, log = FALSE, rho = 1, stubs = length(n) + length(diff) > 2L, ...) ## S3 method for class 'data.frame' D(x, n = 1, diff = 1, by = NULL, t = NULL, cols = is.numeric, fill = NA, rho = 1, stubs = TRUE, keep.ids = TRUE, ...) ## S3 method for class 'data.frame' Dlog(x, n = 1, diff = 1, by = NULL, t = NULL, cols = is.numeric, fill = NA, rho = 1, stubs = TRUE, keep.ids = TRUE, ...) # Methods for compatibility with plm: ## S3 method for class 'pseries' fdiff(x, n = 1, diff = 1, fill = NA, log = FALSE, rho = 1, stubs = TRUE, ...) ## S3 method for class 'pseries' D(x, n = 1, diff = 1, fill = NA, rho = 1, stubs = TRUE, ...) ## S3 method for class 'pseries' Dlog(x, n = 1, diff = 1, fill = NA, rho = 1, stubs = TRUE, ...) ## S3 method for class 'pdata.frame' fdiff(x, n = 1, diff = 1, fill = NA, log = FALSE, rho = 1, stubs = length(n) + length(diff) > 2L, ...) ## S3 method for class 'pdata.frame' D(x, n = 1, diff = 1, cols = is.numeric, fill = NA, rho = 1, stubs = TRUE, keep.ids = TRUE, ...) ## S3 method for class 'pdata.frame' Dlog(x, n = 1, diff = 1, cols = is.numeric, fill = NA, rho = 1, stubs = TRUE, keep.ids = TRUE, ...) # Methods for grouped data frame / compatibility with dplyr: ## S3 method for class 'grouped_df' fdiff(x, n = 1, diff = 1, t = NULL, fill = NA, log = FALSE, rho = 1, stubs = length(n) + length(diff) > 2L, keep.ids = TRUE, ...) ## S3 method for class 'grouped_df' D(x, n = 1, diff = 1, t = NULL, fill = NA, rho = 1, stubs = TRUE, keep.ids = TRUE, ...) ## S3 method for class 'grouped_df' Dlog(x, n = 1, diff = 1, t = NULL, fill = NA, rho = 1, stubs = TRUE, keep.ids = TRUE, ...)
x |
a numeric vector / time series, (time series) matrix, data frame, panel series ( |
n |
integer. A vector indicating the number of lags or leads. |
diff |
integer. A vector of integers > 1 indicating the order of differencing / log-differencing. |
g |
a factor, |
by |
data.frame method: Same as |
t |
same input as |
cols |
data.frame method: Select columns to difference using a function, column names, indices or a logical vector. Default: All numeric variables. Note: |
fill |
value to insert when vectors are shifted. Default is |
log |
logical. |
rho |
double. Autocorrelation parameter. Set to a value between 0 and 1 for quasi-differencing. Any numeric value can be supplied. |
stubs |
logical. |
keep.ids |
data.frame / pdata.frame / grouped_df methods: Logical. Drop all panel-identifiers from the output (which includes all variables passed to |
... |
arguments to be passed to or from other methods. |
By default, fdiff/D/Dlog
return x
with all columns differenced / log-differenced. Differences are computed as repeat(diff) x[i] - rho*x[i-n]
, and log-differences as repeat(diff) log(x[i]) - rho*log(x[i-n])
. If rho < 1
, this becomes quasi- (or partial) differencing, which is a technique suggested by Cochrane and Orcutt (1949) to deal with serial correlation in regression models, where rho
is typically estimated by running a regression of the model residuals on the lagged residuals. Setting diff = 2
returns differences of differences etc... and setting n = 2
returns simple differences computed by subtracting twice-lagged x
from x
. It is also possible to compute forward differences by passing negative n
values. n
also supports arbitrary vectors of integers (lags), and diff
supports positive sequences of integers (differences):
If more than one value is passed to n
and/or diff
, the data is expanded-wide as follows: If x
is an atomic vector or time series, a (time series) matrix is returned with columns ordered first by lag, then by difference. If x
is a matrix or data frame, each column is expanded in like manor such that the output has ncol(x)*length(n)*length(diff)
columns ordered first by column name, then by lag, then by difference.
With groups/panel-identifiers supplied to g/by
, fdiff/D/Dlog
efficiently compute panel-differences. If t
is left empty, the data needs to be ordered such that all values belonging to a group are consecutive and in the right order. It is not necessary that the groups themselves occur in the right order. If time-variable(s) are supplied to t
, the panel is fully identified and differences can be securely computed even if the data is unordered.
fdiff/D/Dlog
supports balanced panels and unbalanced panels where various individuals are observed for different time-sequences.
For computational details and efficiency considerations see the help page for flag
.
It is also possible to compute differences on unordered vectors or irregular time series (thus utilizing t
but leaving g/by
empty).
The methods applying to plm objects (panel series and panel data frames) automatically utilize the panel-identifiers attached to these objects and thus securely compute fully identified panel-differences. If these objects have > 2 panel-identifiers attached to them, the last identifier is assumed to be the time-variable, and the others are taken as grouping-variables and interacted.
x
differenced diff
times using lags n
of itself. Quasi and log-differences are toggled by the rho
and log
arguments or the Dlog
operator. Computations can be grouped by g/by
and/or ordered by t
. See Details and Examples.
Cochrane, D.; Orcutt, G. H. (1949). Application of Least Squares Regression to Relationships Containing Auto-Correlated Error Terms. Journal of the American Statistical Association. 44 (245): 32-61.
Prais, S. J. & Winsten, C. B. (1954). Trend Estimators and Serial Correlation. Cowles Commission Discussion Paper No. 383. Chicago.
## Simple Time Series: AirPassengers D(AirPassengers) # 1st difference, same as fdiff(AirPassengers) D(AirPassengers, -1) # Forward difference Dlog(AirPassengers) # Log-difference D(AirPassengers, 1, 2) # Second difference Dlog(AirPassengers, 1, 2) # Second log-difference D(AirPassengers, 12) # Seasonal difference (data is monthly) D(AirPassengers, # Quasi-difference, see a better example below rho = pwcor(AirPassengers, L(AirPassengers))) head(D(AirPassengers, -2:2, 1:3)) # Sequence of leaded/lagged and iterated differences # let's do some visual analysis plot(AirPassengers) # Plot the series - seasonal pattern is evident plot(stl(AirPassengers, "periodic")) # Seasonal decomposition plot(D(AirPassengers,c(1,12),1:2)) # Plotting ordinary and seasonal first and second differences plot(stl(window(D(AirPassengers,12), # Taking seasonal differences removes most seasonal variation 1950), "periodic")) ## Time Series Matrix of 4 EU Stock Market Indicators, recorded 260 days per year plot(D(EuStockMarkets, c(0, 260))) # Plot series and annual differnces mod <- lm(DAX ~., L(EuStockMarkets, c(0, 260))) # Regressing the DAX on its annual lag summary(mod) # and the levels and annual lags others r <- residuals(mod) # Obtain residuals pwcor(r, L(r)) # Residual Autocorrelation fFtest(r, L(r)) # F-test of residual autocorrelation # (better use lmtest::bgtest) modCO <- lm(QD1.DAX ~., D(L(EuStockMarkets, c(0, 260)), # Cochrane-Orcutt (1949) estimation rho = pwcor(r, L(r)))) summary(modCO) rCO <- residuals(modCO) fFtest(rCO, L(rCO)) # No more autocorrelation ## World Development Panel Data head(fdiff(num_vars(wlddev), 1, 1, # Computes differences of numeric variables wlddev$country, wlddev$year)) # fdiff requires external inputs.. head(D(wlddev, 1, 1, ~country, ~year)) # Differences of numeric variables head(D(wlddev, 1, 1, ~country)) # Without t: Works because data is ordered head(D(wlddev, 1, 1, PCGDP + LIFEEX ~ country, ~year)) # Difference of GDP & Life Expectancy head(D(wlddev, 0:1, 1, ~ country, ~year, cols = 9:10)) # Same, also retaining original series head(D(wlddev, 0:1, 1, ~ country, ~year, 9:10, # Dropping id columns keep.ids = FALSE)) # Dynamic Panel Data Models: summary(lm(D(PCGDP,1,1,iso3c,year) ~ # Diff. GDP regressed on it's lagged level L(PCGDP,1,iso3c,year) + # and the difference of Life Expanctancy D(LIFEEX,1,1,iso3c,year), data = wlddev)) g = qF(wlddev$country) # Omitting t and precomputing g allows for summary(lm(D(PCGDP,1,1,g) ~ L(PCGDP,1,g) + # a bit more parsimonious specification D(LIFEEX,1,1,g), wlddev)) summary(lm(D1.PCGDP ~., # Now adding level and lagged level of L(D(wlddev,0:1,1, ~ country, ~year,9:10),0:1, # LIFEEX and lagged differences rates ~ country, ~year, keep.ids = FALSE)[-1])) ## Using plm can make things easier, but avoid attaching or 'with' calls: pwlddev <- plm::pdata.frame(wlddev, index = c("country","year")) head(D(pwlddev, 0:1, 1, 9:10)) # Again differences of LIFEEX and PCGDP PCGDP <- pwlddev$PCGDP # A panel-Series of GDP per Capita head(D(PCGDP)) # Differencing the panel series summary(lm(D1.PCGDP ~., # Running the dynamic model again -> data = L(D(pwlddev,0:1,1,9:10),0:1, # code becomes a bit simpler keep.ids = FALSE)[-1])) # One could be tempted to also do something like this, but THIS DOES NOT WORK!!: # -> a pseries is only created when subsetting the pdata.frame using $ or [[ summary(lm(D(PCGDP) ~ L(D(PCGDP,0:1)) + L(D(LIFEEX,0:1),0:1), pwlddev)) # To make it work, one needs to create pseries LIFEEX <- pwlddev$LIFEEX summary(lm(D(PCGDP) ~ L(D(PCGDP,0:1)) + L(D(LIFEEX,0:1),0:1))) # THIS WORKS ! ## Using dplyr: library(dplyr) wlddev %>% group_by(country) %>% select(PCGDP,LIFEEX) %>% fdiff(0:1,1:2) # Adding a first and second difference wlddev %>% group_by(country) %>% select(year,PCGDP,LIFEEX) %>% D(0:1,1:2,year) # Also using t (safer) wlddev %>% group_by(country) %>% # Dropping id's select(year,PCGDP,LIFEEX) %>% D(0:1,1:2,year, keep.ids = FALSE)
Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.