Perform repeated sampling
These functions extend the functionality of dplyr::sample_n() and
dplyr::slice_sample() by allowing for repeated sampling of data.
This operation is especially helpful while creating sampling
distributions—see the examples below!
rep_sample_n(tbl, size, replace = FALSE, reps = 1, prob = NULL) rep_slice_sample(.data, n = 1, replace = FALSE, weight_by = NULL, reps = 1)
tbl, .data |
Data frame of population from which to sample. |
size, n |
Sample size of each sample. |
replace |
Should sampling be with replacement? |
reps |
Number of samples of size n = |
prob, weight_by |
A vector of sampling weights for each of the rows in
|
The dplyr::sample_n() function (to which rep_sample_n() was
originally a supplement) has been superseded by dplyr::slice_sample().
rep_slice_sample() provides a light wrapper around rep_sample_n() that
has a more similar interface to slice_sample().
A tibble of size rep * size rows corresponding to reps
samples of size size from tbl, grouped by replicate.
library(dplyr)
library(ggplot2)
# take 1000 samples of size n = 50, without replacement
slices <- gss %>%
rep_sample_n(size = 50, reps = 1000)
slices
# compute the proportion of respondents with a college
# degree in each replicate
p_hats <- slices %>%
group_by(replicate) %>%
summarize(prop_college = mean(college == "degree"))
# plot sampling distribution
ggplot(p_hats, aes(x = prop_college)) +
geom_density() +
labs(
x = "p_hat", y = "Number of samples",
title = "Sampling distribution of p_hat"
)
# sampling with probability weights. Note probabilities are automatically
# renormalized to sum to 1
library(tibble)
df <- tibble(
id = 1:5,
letter = factor(c("a", "b", "c", "d", "e"))
)
rep_sample_n(df, size = 2, reps = 5, prob = c(.5, .4, .3, .2, .1))Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.