Selecting by name
In this chapter we will have a look at the
pres_results dataset from the politicaldata package. It contains data about US presidential elections since 1976, converted to a Tibble for nicer printing.
# A tibble: 561 x 6 year state total_votes dem rep other <dbl> <chr> <dbl> <dbl> <dbl> <dbl> 1 1976 AK 123574 0.357 0.579 0.0549 2 1976 AL 1182850 0.557 0.426 0.0163 3 1976 AR 767535 0.650 0.349 0.00134 # … with 558 more rows
For this example, we will have a look at the number of total votes in different states at different elections. Since we are only interested in the number of people who voted we would like to create a custom version of the
pres_results data frame that only contains the columns
total_votes. For such filtering, we can use the
select() fiction from the dplyr package.
select() function takes a data frame as an input parameter and lets us decide which of the columns we want to keep from it. The output of the function is a data frame with all rows, but containing only the columns we explicitly select.
We can reduce our dataset to only
total_votes in the following way:
select(pres_results, year, state, total_votes)
# A tibble: 561 x 3 year state total_votes <dbl> <chr> <dbl> 1 1976 AK 123574 2 1976 AL 1182850 3 1976 AR 767535 # … with 558 more rows
As the first parameter we passed the
pres_results data frame, as the remaining parameters we passed the columns we want to keep to
Apart from keeping the columns we want, the
select() function also keeps them in the same order as we specified in the function parameters.
If we change the order of the parameters when we call the function, the columns of the output change accordingly:
select(pres_results, total_votes, year, state)
# A tibble: 561 x 3 total_votes year state <dbl> <dbl> <chr> 1 123574 1976 AK 2 1182850 1976 AL 3 767535 1976 AR # … with 558 more rows