Become an expert in R — Interactive courses, Cheat Sheets, certificates and more!
Get Started for Free

Categorical

Categorical distribution


Description

Probability mass function, distribution function, quantile function and random generation for the categorical distribution.

Usage

dcat(x, prob, log = FALSE)

pcat(q, prob, lower.tail = TRUE, log.p = FALSE)

qcat(p, prob, lower.tail = TRUE, log.p = FALSE, labels)

rcat(n, prob, labels)

rcatlp(n, log_prob, labels)

Arguments

x, q

vector of quantiles.

prob, log_prob

vector of length m, or m-column matrix of non-negative weights (or their logarithms in log_prob).

log, log.p

logical; if TRUE, probabilities p are given as log(p).

lower.tail

logical; if TRUE (default), probabilities are P[X ≤ x] otherwise, P[X > x].

p

vector of probabilities.

labels

if provided, labeled factor vector is returned. Number of labels needs to be the same as number of categories (number of columns in prob).

n

number of observations. If length(n) > 1, the length is taken to be the number required.

Details

Probability mass function

Pr(X = k) = w[k]/sum(w)

Cumulative distribution function

Pr(X <= k) = sum(w[1:k])/sum(w)

It is possible to sample from categorical distribution parametrized by vector of unnormalized log-probabilities α[1],...,α[m] without leaving the log space by employing the Gumbel-max trick (Maddison, Tarlow and Minka, 2014). If g[1],...,g[m] are samples from Gumbel distribution with cumulative distribution function F(g) = exp(-exp(-g)), then k = argmax(g[i]+α[i]) is a draw from categorical distribution parametrized by vector of probabilities p[1]....,p[m], such that p[i] = exp(α[i])/sum(exp(α)). This is implemented in rcatlp function parametrized by vector of log-probabilities log_prob.

References

Maddison, C. J., Tarlow, D., & Minka, T. (2014). A* sampling. [In:] Advances in Neural Information Processing Systems (pp. 3086-3094). https://arxiv.org/abs/1411.0030

Examples

# Generating 10 random draws from categorical distribution
# with k=3 categories occuring with equal probabilities
# parametrized using a vector

rcat(10, c(1/3, 1/3, 1/3))

# or with k=5 categories parametrized using a matrix of probabilities
# (generated from Dirichlet distribution)

p <- rdirichlet(10, c(1, 1, 1, 1, 1))
rcat(10, p)

x <- rcat(1e5, c(0.2, 0.4, 0.3, 0.1))
plot(prop.table(table(x)), type = "h")
lines(0:5, dcat(0:5, c(0.2, 0.4, 0.3, 0.1)), col = "red")

p <- rdirichlet(1, rep(1, 20))
x <- rcat(1e5, matrix(rep(p, 2), nrow = 2, byrow = TRUE))
xx <- 0:21
plot(prop.table(table(x)))
lines(xx, dcat(xx, p), col = "red")

xx <- seq(0, 21, by = 0.01)
plot(ecdf(x))
lines(xx, pcat(xx, p), col = "red", lwd = 2)

pp <- seq(0, 1, by = 0.001)
plot(ecdf(x))
lines(qcat(pp, p), pp, col = "red", lwd = 2)

extraDistr

Additional Univariate and Multivariate Distributions

v1.9.1
GPL-2
Authors
Tymoteusz Wolodzko
Initial release
2020-08-20

We don't support your browser anymore

Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.