collapse: fHDbetween_fHDwithin – R documentation

Become an expert in R — Interactive courses, Cheat Sheets, certificates and more!

fHDbetween_fHDwithin

Higher-Dimensional Centering and Linear Prediction

Description

fHDbetween is a generalization of fbetween to efficiently predict with multiple factors and linear models (i.e. predict with vectors/factors, matrices, or data frames/lists where the latter may contain multiple factor variables). Similarly fHDwithin is a generalization of fwithin to center on multiple factors and partial-out linear models.

The corresponding operators HDB and HDW additionally allow to predict / partial out full lm() formulas with interactions between variables.

Usage

fHDbetween(x, ...)
 fHDwithin(x, ...)
       HDB(x, ...)
       HDW(x, ...)

## Default S3 method:
fHDbetween(x, fl, w = NULL, na.rm = TRUE, fill = FALSE, lm.method = "qr", ...)
## Default S3 method:
fHDwithin(x, fl, w = NULL, na.rm = TRUE, fill = FALSE, lm.method = "qr", ...)
## Default S3 method:
HDB(x, fl, w = NULL, na.rm = TRUE, fill = FALSE, lm.method = "qr", ...)
## Default S3 method:
HDW(x, fl, w = NULL, na.rm = TRUE, fill = FALSE, lm.method = "qr", ...)

## S3 method for class 'matrix'
fHDbetween(x, fl, w = NULL, na.rm = TRUE, fill = FALSE, lm.method = "qr", ...)
## S3 method for class 'matrix'
fHDwithin(x, fl, w = NULL, na.rm = TRUE, fill = FALSE, lm.method = "qr", ...)
## S3 method for class 'matrix'
HDB(x, fl, w = NULL, na.rm = TRUE, fill = FALSE, stub = "HDB.", lm.method = "qr", ...)
## S3 method for class 'matrix'
HDW(x, fl, w = NULL, na.rm = TRUE, fill = FALSE, stub = "HDW.", lm.method = "qr", ...)

## S3 method for class 'data.frame'
fHDbetween(x, fl, w = NULL, na.rm = TRUE, fill = FALSE,
           variable.wise = FALSE, lm.method = "qr", ...)
## S3 method for class 'data.frame'
fHDwithin(x, fl, w = NULL, na.rm = TRUE, fill = FALSE,
          variable.wise = FALSE, lm.method = "qr", ...)
## S3 method for class 'data.frame'
HDB(x, fl, w = NULL, cols = is.numeric, na.rm = TRUE, fill = FALSE,
    variable.wise = FALSE, stub = "HDB.", lm.method = "qr", ...)
## S3 method for class 'data.frame'
HDW(x, fl, w = NULL, cols = is.numeric, na.rm = TRUE, fill = FALSE,
    variable.wise = FALSE, stub = "HDW.", lm.method = "qr", ...)

# Methods for compatibility with plm:

## S3 method for class 'pseries'
fHDbetween(x, w = NULL, na.rm = TRUE, fill = TRUE, ...)
## S3 method for class 'pseries'
fHDwithin(x, w = NULL, na.rm = TRUE, fill = TRUE, ...)
## S3 method for class 'pseries'
HDB(x, w = NULL, na.rm = TRUE, fill = TRUE, ...)
## S3 method for class 'pseries'
HDW(x, w = NULL, na.rm = TRUE, fill = TRUE, ...)

## S3 method for class 'pdata.frame'
fHDbetween(x, w = NULL, na.rm = TRUE, fill = TRUE,
           variable.wise = TRUE, ...)
## S3 method for class 'pdata.frame'
fHDwithin(x, w = NULL, na.rm = TRUE, fill = TRUE,
          variable.wise = TRUE, ...)
## S3 method for class 'pdata.frame'
HDB(x, w = NULL, cols = is.numeric, na.rm = TRUE, fill = TRUE,
    variable.wise = TRUE, stub = "HDB.", ...)
## S3 method for class 'pdata.frame'
HDW(x, w = NULL, cols = is.numeric, na.rm = TRUE, fill = TRUE,
    variable.wise = TRUE, stub = "HDW.", ...)

Arguments

`x`	a numeric vector, matrix, data frame, panel series (`plm::pseries`) or panel data frame (`plm::pdata.frame`).
`fl`	a numeric vector, factor, matrix, data frame or list (which may or may not contain factors). In the data frame method `fl` can also be a one-or two sided `lm()` formula with variables contained in `x`. Interactions `(:)` and full interactions `(*)` are supported. See Examples and the Note.
`w`	a vector of (non-negative) weights.
`cols`	data.frame methods: Select columns to center (partial-out) or predict using column names, indices, a logical vector or a function. Unless specified otherwise all numeric columns are selected. If `NULL`, all variables are selected.
`na.rm`	remove missing values from both `x` and `fl`. by default rows with missing values in `x` or `fl` are removed. In that case an attribute "na.rm" is attached containing the rows removed.
`fill`	If `na.rm = TRUE`, `fill = TRUE` will not remove rows with missing values in `x` or `fl`, but fill them with `NA`'s.
`variable.wise`	data.frame methods: Setting `variable.wise = TRUE` will process each column individually i.e. use all non-missing cases in each column and in `fl` (`fl` is only checked for missing values if `na.rm = TRUE`). This is a lot less efficient but uses all data available in each column.
`stub`	a prefix or stub to rename all transformed columns. `FALSE` will not rename columns.
`lm.method`	character. The linear fitting method. Supported are `"chol"` and `"qr"`. See `flm`.
`...`	further arguments passed to `fixest::demean` and `chol` / `qr`. Possible choices are `tol` to set a uniform numerical tolerance for the entire fitting process, or `nthreads` and `iter` to govern the higher-order centering process.

Details

fHDbetween/HDB and fHDwithin/HDW are powerful functions for high-dimensional linear prediction problems involving large factors and datasets, but can just as well handle ordinary regression problems. They are implemented as efficient wrappers around fbetween / fwithin, flm and fixest::demean (imported for higher-order centering tasks).

Intended areas of use are to efficiently obtain residuals and predicted values from data, and to prepare data for complex linear models involving multiple levels of fixed effects. Such models can now be fitted using lm() on data prepared with fHDwithin / HDW (relying on bootstrapped SE's for inference, or implementing the appropriate corrections). See Examples.

If fl is a vector or matrix, the result are identical to lm i.e. fHDbetween / HDB returns fitted(lm(x ~ fl)) and fHDwithin / HDW residuals(lm(x ~ fl)). If fl is a list containing factors, all variables in x and non-factor variables in fl are centered on these factors using either fbetween / fwithin for a single factor or fixest::demean for multiple factors. Afterwards the centered data is regressed on the centered predictors. If fl is just a list of factors, fHDwithin/HDW returns the centered data and fHDbetween/HDB the corresponding means. Take as a most general example a list fl = list(fct1, fct2, ..., var1, var2, ...) where fcti are factors and vari are continuous variables. The output of fHDwithin/HDW | fHDbetween/HDB will then be identical to calling resid | fitted on lm(x ~ fct1 + fct2 + ... + var1 + var2 + ...). The computations performed by fHDwithin/HDW and fHDbetween/HDB are however much faster and more memory efficient than lm because factors are not passed to model.matrix and expanded to matrices of dummies but projected beforehand.

The formula interface to the data.frame method (only supported by the operators HDW | HDB) provides ease of use and allows for additional modeling complexity. For example it is possible to project out formulas like HDW(data, ~ fct1*var1 + fct2:fct3 + var2:fct2:fct3 + var2:var3 + poly(var5,3)*fct5) containing simple (:) or full (*) interactions of factors with continuous variables or polynomials of continuous variables, and two-or three-way interactions of factors and continuous variables. If the formula is one-sided as in the example above (the space left of (~) is left empty), the formula is applied to all variables selected through cols. The specification provided in cols (default: all numeric variables not used in the formula) can be overridden by supplying one-or more dependent variables. For example HDW(data, var1 + var2 ~ fct1 + fct2) will return a data.frame with var1 and var2 centered on fct1 and fct2.

The special methods for plm::pseries and plm::pdata.frame center a panel series or variables in a panel data frame on all panel-identifiers. By default in these methods fill = TRUE and variable.wise = TRUE, so missing values are kept. This change in the default arguments was done to ensure a coherent framework of functions and operators applied to plm panel data classes.

Value

HDB returns fitted values of regressing x on fl. HDW returns residuals. See Details and Examples.

Note

On the differences between `fHDwithin/HDW`... and `fwithin/W`...:

fHDwithin/HDW can center data on multiple factors and also partial out continuous variables and factor-continuous interactions while fwithin/W only centers on one factor or the interaction of a set of factors, and does that very efficiently.
HDW(data, ~ qF(group1) + qF(group2)) simultaneously centers numeric variables in data on group1 and group2, while W(data, ~ group1 + group2) centers data on the interaction of group1 and group2. The equivalent operation in HDW would be: HDW(data, ~ qF(group1):qF(group2)).
W always does computations on the variable-wise complete observations (in both matrices and data frames), whereas by default HDW removes all cases missing in either x or fl. In short, W(data, ~ group1 + group2) is actually equivalent to HDW(data, ~ qF(group1):qF(group2), variable.wise = TRUE). HDW(data, ~ qF(group1):qF(group2)) would remove any missing cases.
fbetween/B and fwithin/W have options to fill missing cases using group-averages and to add the overall mean back to group-demeaned data. These options are not available in fHDbetween/HDB and fHDwithin/HDW. Since HDB and HDW by default remove missing cases, they also don't have options to keep grouping-columns as in B and W.

Examples

HDW(mtcars$mpg, mtcars$carb)                   # Simple regression problems..
HDW(mtcars$mpg, mtcars[-1])
HDW(mtcars$mpg, qM(mtcars[-1]))
head(HDW(qM(mtcars[3:4]), mtcars[1:2]))
head(HDW(iris[1:2], iris[3:4]))                # Partialling columns 3 and 4 out of colums 1 and 2
head(HDW(iris[1:2], iris[3:5]))                # Adding the Species factor -> fixed effect
 
head(HDW(wlddev, PCGDP + LIFEEX ~ iso3c + qF(year))) # Partialling out 2 fixed effects
head(HDW(wlddev, PCGDP + LIFEEX ~ iso3c + qF(year), variable.wise = TRUE)) # Variable-wise
head(HDW(wlddev, PCGDP + LIFEEX ~ iso3c + qF(year) + ODA)) # Adding ODA as a continuouus regressor
head(HDW(wlddev, PCGDP + LIFEEX ~ iso3c:qF(decade) + qF(year) + ODA)) # Country-decade and year FE's

head(HDW(wlddev, PCGDP + LIFEEX ~ iso3c*year))          # Country specific time trends
head(HDW(wlddev, PCGDP + LIFEEX ~ iso3c*poly(year, 3))) # Country specific cubic trends

# More complex examples
lm(HDW.mpg ~ HDW.hp, data = HDW(mtcars, ~ factor(cyl)*carb + vs + wt:gear + wt:gear:carb))
lm(mpg ~ hp + factor(cyl)*carb + vs + wt:gear + wt:gear:carb, data = mtcars)

lm(HDW.mpg ~ HDW.hp, data = HDW(mtcars, ~ factor(cyl)*carb + vs + wt:gear))
lm(mpg ~ hp + factor(cyl)*carb + vs + wt:gear, data = mtcars)

lm(HDW.mpg ~ HDW.hp, data = HDW(mtcars, ~ cyl*carb + vs + wt:gear))
lm(mpg ~ hp + cyl*carb + vs + wt:gear, data = mtcars)

lm(HDW.mpg ~ HDW.hp, data = HDW(mtcars, mpg + hp ~ cyl*carb + factor(cyl)*poly(drat,2)))
lm(mpg ~ hp + cyl*carb + factor(cyl)*poly(drat,2), data = mtcars)

collapse

Advanced and Fast Data Transformation

v1.5.3

GPL (>= 2) | file LICENSE

Authors

Sebastian Krantz [aut, cre], Matt Dowle [ctb], Arun Srinivasan [ctb], Laurent Berge [ctb], Dirk Eddelbuettel [ctb], Josh Pasek [ctb], Kevin Tappe [ctb], R Core Team and contributors worldwide [ctb], Martyn Plummer [cph], 1999-2016 The R Core Team [cph]

Initial release

2021-03-05

fHDbetween_fHDwithin

Description

Usage

Arguments

Details

Value

Note

On the differences between `fHDwithin/HDW`... and `fwithin/W`...:

See Also

Examples

collapse

We don't support your browser anymore

fHDbetween_fHDwithin

Description

Usage

Arguments

Details

Value

Note

On the differences between fHDwithin/HDW... and fwithin/W...:

See Also

Examples

collapse

We don't support your browser anymore

On the differences between `fHDwithin/HDW`... and `fwithin/W`...: