Fast (Grouped, Weighted) N'th Element/Quantile for Matrix-Like Objects
fnth
(column-wise) returns the n'th smallest element from a set of unsorted elements x
corresponding to an integer index (n
), or to a probability between 0 and 1. If n
is passed as a probability, ties can be resolved using the lower, upper, or (default) average of the possible elements. These are discontinuous and fast methods to estimate a sample quantile.
fnth(x, n = 0.5, ...) ## Default S3 method: fnth(x, n = 0.5, g = NULL, w = NULL, TRA = NULL, na.rm = TRUE, use.g.names = TRUE, ties = "mean", ...) ## S3 method for class 'matrix' fnth(x, n = 0.5, g = NULL, w = NULL, TRA = NULL, na.rm = TRUE, use.g.names = TRUE, drop = TRUE, ties = "mean", ...) ## S3 method for class 'data.frame' fnth(x, n = 0.5, g = NULL, w = NULL, TRA = NULL, na.rm = TRUE, use.g.names = TRUE, drop = TRUE, ties = "mean", ...) ## S3 method for class 'grouped_df' fnth(x, n = 0.5, w = NULL, TRA = NULL, na.rm = TRUE, use.g.names = FALSE, keep.group_vars = TRUE, keep.w = TRUE, ties = "mean", ...)
x |
a numeric vector, matrix, data frame or grouped data frame (class 'grouped_df'). |
|||||||||||||||||||||
n |
the element to return using a single integer index such that |
|||||||||||||||||||||
g |
a factor, |
|||||||||||||||||||||
w |
a numeric vector of (non-negative) weights, may contain missing values. |
|||||||||||||||||||||
TRA |
an integer or quoted operator indicating the transformation to perform:
1 - "replace_fill" | 2 - "replace" | 3 - "-" | 4 - "-+" | 5 - "/" | 6 - "%" | 7 - "+" | 8 - "*" | 9 - "%%" | 10 - "-%%". See |
|||||||||||||||||||||
na.rm |
logical. Skip missing values in |
|||||||||||||||||||||
use.g.names |
logical. Make group-names and add to the result as names (default method) or row-names (matrix and data frame methods). No row-names are generated for data.table's. |
|||||||||||||||||||||
ties |
an integer or character string specifying the method to resolve ties between adjacent qualifying elements:
|
|||||||||||||||||||||
drop |
matrix and data.frame method: Logical. |
|||||||||||||||||||||
keep.group_vars |
grouped_df method: Logical. |
|||||||||||||||||||||
keep.w |
grouped_df method: Logical. Retain |
|||||||||||||||||||||
... |
arguments to be passed to or from other methods. |
This is an R port to std::nth_element
, an efficient partial sorting algorithm in C++. It is also used to calculated the median (in fact the default fnth(x, n = 0.5)
is identical to fmedian(x)
, so see also the details for fmedian
).
fnth
generalizes the principles of median value calculation to find arbitrary elements. It offers considerable flexibility by providing both simple order statistics and simple discontinuous quantile estimation. Regarding the former, setting n
to an index between 1 and NROW(x)
will return the n'th smallest element of x
, about 2x faster than sort(x, partial = n)[n]
. As to the latter, setting n
to a probability between 0 and 1 will return the corresponding element of x
, and resolve ties between multiple qualifying elements (such as when n = 0.5
and x
is even) using the arithmetic average ties = "mean"
, or the smallest ties = "min"
or largest ties = "max"
of those elements.
If n > 1
is used and x
contains missing values (and na.rm = TRUE
, otherwise NA
is returned), n
is internally converted to a probability using p = (n-1)/(NROW(x)-1)
, and that probability is applied to the set of complete elements (of each column if x
is a matrix or data frame) to find the as.integer(p*(fNobs(x)-1))+1L
'th element (which corresponds to option ties = "min"
). Note that it is necessary to subtract and add 1 so that n = 1
corresponds to p = 0
and n = NROW(x)
to p = 1
.
When using grouped computations (supplying a vector or list to g
subdividing x
) and n > 1
is used, it is transformed to a probability p = (n-1)/(NROW(x)/ng-1)
(where ng
contains the number of unique groups in g
) and ties = "lower"
is used to sort out clashes. This could be useful for example to return the n'th smallest element of each group in a balanced panel, but with unequal group sizes it more intuitive to pass a probability to n
.
If weights are used, the same principles apply as for weighted median calculation: A target partial sum of weights p*sum(w)
is calculated, and the weighted n'th element is the element k such that all elements smaller than k have a sum of weights <= p*sum(w)
, and all elements larger than k have a sum of weights <= (1 - p)*sum(w)
. If the partial-sum of weights (p*sum(w)
) is reached exactly for some element k, then (summing from the lower end) both k and k+1 would qualify as the weighted n'th element (and some possible additional elements with zero weights following k would also qualify). If n > 1
, the lowest of those elements is chosen (congruent with the unweighted behavior),
but if 0 < n < 1
, the ties
option regulates how to resolve such conflicts, yielding lower-weighted, upper-weighted or (default) average weighted n'th elements.
The weighted n'th element is computed using radixorder
to first obtain an ordering of all elements, so it is considerably more computationally expensive than the unweighted version. With groups, the entire vector is also ordered, and the weighted n'th element is computed in a single ordered pass through the data (after calculating partial-group sums of the weights, skipping weights for which x
is missing).
If x
is a matrix or data frame, these computations are performed independently for each column. Column-attributes and overall attributes of a data frame are preserved (if g
is used or drop = FALSE
).
The (w
weighted) n'th element of x
, grouped by g
, or (if TRA
is used) x
transformed by its n'th element, grouped by g
.
## default vector method mpg <- mtcars$mpg fnth(mpg) # Simple nth element: Median (same as fmedian(mpg)) fnth(mpg, 5) # 5th smallest element sort(mpg, partial = 5)[5] # Same using base R, fnth is 2x faster. fnth(mpg, 0.75) # Third quartile fnth(mpg, 0.75, w = mtcars$hp) # Weighted third quartile: Weighted by hp fnth(mpg, 0.75, TRA = "-") # Simple transformation: Subtract third quartile fnth(mpg, 0.75, mtcars$cyl) # Grouped third quartile fnth(mpg, 0.75, mtcars[c(2,8:9)]) # More groups.. g <- GRP(mtcars, ~ cyl + vs + am) # Precomputing groups gives more speed ! fnth(mpg, 0.75, g) fnth(mpg, 0.75, g, mtcars$hp) # Grouped weighted third quartile fnth(mpg, 0.75, g, TRA = "-") # Groupwise subtract third quartile fnth(mpg, 0.75, g, mtcars$hp, "-") # Groupwise subtract weighted third quartile ## data.frame method fnth(mtcars, 0.75) head(fnth(mtcars, 0.75, TRA = "-")) fnth(mtcars, 0.75, g) fnth(fgroup_by(mtcars, cyl, vs, am), 0.75) # Another way of doing it.. fnth(mtcars, 0.75, g, use.g.names = FALSE) # No row-names generated ## matrix method m <- qM(mtcars) fnth(m, 0.75) head(fnth(m, 0.75, TRA = "-")) fnth(m, 0.75, g) # etc.. library(dplyr) ## grouped_df method mtcars %>% group_by(cyl,vs,am) %>% fnth(0.75) mtcars %>% group_by(cyl,vs,am) %>% fnth(0.75, hp) # Weighted mtcars %>% fgroup_by(cyl,vs,am) %>% fnth(0.75) # Faster grouping! mtcars %>% fgroup_by(cyl,vs,am) %>% fnth(0.75, TRA = "/") # Divide by third quartile mtcars %>% fgroup_by(cyl,vs,am) %>% fselect(mpg, hp) %>% # Faster selecting fnth(0.75, hp, "/") # Divide mpg by its third weighted group-quartile, using hp as weights
Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.