infer: rep_sample_n – R documentation

Become an expert in R — Interactive courses, Cheat Sheets, certificates and more!

rep_sample_n

Perform repeated sampling

Description

These functions extend the functionality of dplyr::sample_n() and dplyr::slice_sample() by allowing for repeated sampling of data. This operation is especially helpful while creating sampling distributions—see the examples below!

Usage

rep_sample_n(tbl, size, replace = FALSE, reps = 1, prob = NULL)

rep_slice_sample(.data, n = 1, replace = FALSE, weight_by = NULL, reps = 1)

Arguments

`tbl, .data`	Data frame of population from which to sample.
`size, n`	Sample size of each sample.
`replace`	Should sampling be with replacement?
`reps`	Number of samples of size n = `size` to take.
`prob, weight_by`	A vector of sampling weights for each of the rows in `tbl`—must have length equal to `nrow(tbl)`.

Details

The dplyr::sample_n() function (to which rep_sample_n() was originally a supplement) has been superseded by dplyr::slice_sample(). rep_slice_sample() provides a light wrapper around rep_sample_n() that has a more similar interface to slice_sample().

Value

A tibble of size rep * size rows corresponding to reps samples of size size from tbl, grouped by replicate.

Examples

library(dplyr)
library(ggplot2)

# take 1000 samples of size n = 50, without replacement
slices <- gss %>%
  rep_sample_n(size = 50, reps = 1000)

slices

# compute the proportion of respondents with a college
# degree in each replicate
p_hats <- slices %>%
  group_by(replicate) %>%
  summarize(prop_college = mean(college == "degree"))

# plot sampling distribution
ggplot(p_hats, aes(x = prop_college)) +
  geom_density() +
  labs(
    x = "p_hat", y = "Number of samples",
    title = "Sampling distribution of p_hat"
  )
  
# sampling with probability weights. Note probabilities are automatically 
# renormalized to sum to 1
library(tibble)
df <- tibble(
  id = 1:5,
  letter = factor(c("a", "b", "c", "d", "e"))
)
rep_sample_n(df, size = 2, reps = 5, prob = c(.5, .4, .3, .2, .1))

infer

Tidy Statistical Inference

v0.5.4

CC0

Authors

Andrew Bray [aut, cre], Chester Ismay [aut] (<https://orcid.org/0000-0003-2820-2547>), Evgeni Chasnovski [aut] (<https://orcid.org/0000-0002-1617-4019>), Ben Baumer [aut] (<https://orcid.org/0000-0002-3279-0516>), Mine Cetinkaya-Rundel [aut] (<https://orcid.org/0000-0001-6452-2420>), Simon Couch [ctb], Ted Laderas [ctb] (<https://orcid.org/0000-0002-6207-7068>), Nick Solomon [ctb], Johanna Hardin [ctb], Albert Y. Kim [ctb] (<https://orcid.org/0000-0001-7824-306X>), Neal Fultz [ctb], Doug Friedman [ctb], Richie Cotton [ctb] (<https://orcid.org/0000-0003-2504-802X>), Brian Fannin [ctb]

Initial release