readxl: read_excel – R documentation

Pricing

Become an expert in R — Interactive courses, Cheat Sheets, certificates and more!

Get Started for Free

Documentation

readxl

read_excel

Read xls and xlsx files

Description

Read xls and xlsx files

read_excel() calls excel_format() to determine if path is xls or xlsx, based on the file extension and the file itself, in that order. Use read_xls() and read_xlsx() directly if you know better and want to prevent such guessing.

Usage

read_excel(path, sheet = NULL, range = NULL, col_names = TRUE,
  col_types = NULL, na = "", trim_ws = TRUE, skip = 0,
  n_max = Inf, guess_max = min(1000, n_max),
  progress = readxl_progress(), .name_repair = "unique")

read_xls(path, sheet = NULL, range = NULL, col_names = TRUE,
  col_types = NULL, na = "", trim_ws = TRUE, skip = 0,
  n_max = Inf, guess_max = min(1000, n_max),
  progress = readxl_progress(), .name_repair = "unique")

read_xlsx(path, sheet = NULL, range = NULL, col_names = TRUE,
  col_types = NULL, na = "", trim_ws = TRUE, skip = 0,
  n_max = Inf, guess_max = min(1000, n_max),
  progress = readxl_progress(), .name_repair = "unique")

Arguments

`path`	Path to the xls/xlsx file.
`sheet`	Sheet to read. Either a string (the name of a sheet), or an integer (the position of the sheet). Ignored if the sheet is specified via `range`. If neither argument specifies the sheet, defaults to the first sheet.
`range`	A cell range to read from, as described in cell-specification. Includes typical Excel ranges like "B3:D87", possibly including the sheet name like "Budget!B2:G14", and more. Interpreted strictly, even if the range forces the inclusion of leading or trailing empty rows or columns. Takes precedence over `skip`, `n_max` and `sheet`.
`col_names`	`TRUE` to use the first row as column names, `FALSE` to get default names, or a character vector giving a name for each column. If user provides `col_types` as a vector, `col_names` can have one entry per column, i.e. have the same length as `col_types`, or one entry per unskipped column.
`col_types`	Either `NULL` to guess all from the spreadsheet or a character vector containing one entry per column from these options: "skip", "guess", "logical", "numeric", "date", "text" or "list". If exactly one `col_type` is specified, it will be recycled. The content of a cell in a skipped column is never read and that column will not appear in the data frame output. A list cell loads a column as a list of length 1 vectors, which are typed using the type guessing logic from `col_types = NULL`, but on a cell-by-cell basis.
`na`	Character vector of strings to interpret as missing values. By default, readxl treats blank cells as missing data.
`trim_ws`	Should leading and trailing whitespace be trimmed?
`skip`	Minimum number of rows to skip before reading anything, be it column names or data. Leading empty rows are automatically skipped, so this is a lower bound. Ignored if `range` is given.
`n_max`	Maximum number of data rows to read. Trailing empty rows are automatically skipped, so this is an upper bound on the number of rows in the returned tibble. Ignored if `range` is given.
`guess_max`	Maximum number of data rows to use for guessing column types.
`progress`	Display a progress spinner? By default, the spinner appears only in an interactive session, outside the context of knitting a document, and when the call is likely to run for several seconds or more. See `readxl_progress()` for more details.
`.name_repair`	Handling of column names. By default, readxl ensures column names are not empty and are unique. If the tibble package version is recent enough, there is full support for `.name_repair` as documented in `tibble::tibble()`. If an older version of tibble is present, readxl falls back to name repair in the style of tibble v1.4.2.

Value

A tibble

Examples

datasets <- readxl_example("datasets.xlsx")
read_excel(datasets)

# Specify sheet either by position or by name
read_excel(datasets, 2)
read_excel(datasets, "mtcars")

# Skip rows and use default column names
read_excel(datasets, skip = 148, col_names = FALSE)

# Recycle a single column type
read_excel(datasets, col_types = "text")

# Specify some col_types and guess others
read_excel(datasets, col_types = c("text", "guess", "numeric", "guess", "guess"))

# Accomodate a column with disparate types via col_type = "list"
df <- read_excel(readxl_example("clippy.xlsx"), col_types = c("text", "list"))
df
df$value
sapply(df$value, class)

# Limit the number of data rows read
read_excel(datasets, n_max = 3)

# Read from an Excel range using A1 or R1C1 notation
read_excel(datasets, range = "C1:E7")
read_excel(datasets, range = "R1C2:R2C5")

# Specify the sheet as part of the range
read_excel(datasets, range = "mtcars!B1:D5")

# Read only specific rows or columns
read_excel(datasets, range = cell_rows(102:151), col_names = FALSE)
read_excel(datasets, range = cell_cols("B:D"))

# Get a preview of column names
names(read_excel(readxl_example("datasets.xlsx"), n_max = 0))

if (utils::packageVersion("tibble") > "1.4.2") {
  ## exploit full .name_repair flexibility from tibble

  ## "universal" names are unique and syntactic
  read_excel(
    readxl_example("deaths.xlsx"),
    range = "arts!A5:F15",
    .name_repair = "universal"
  )

  ## specify name repair as a built-in function
  read_excel(readxl_example("clippy.xlsx"), .name_repair = toupper)

  ## specify name repair as a custom function
  my_custom_name_repair <- function(nms) tolower(gsub("[.]", "_", nms))
  read_excel(
    readxl_example("datasets.xlsx"),
    .name_repair = my_custom_name_repair
  )

  ## specify name repair as an anonymous function
  read_excel(
    readxl_example("datasets.xlsx"),
    sheet = "chickwts",
    .name_repair = ~ substr(.x, start = 1, stop = 3)
  )
}

readxl

Read Excel Files

v1.3.1

GPL-3

Authors

Hadley Wickham [aut] (<https://orcid.org/0000-0003-4757-117X>), Jennifer Bryan [aut, cre] (<https://orcid.org/0000-0002-6983-2759>), RStudio [cph, fnd] (Copyright holder of all R code and all C/C++ code without explicit copyright attribution), Marcin Kalicinski [ctb, cph] (Author of included RapidXML code), Komarov Valery [ctb, cph] (Author of included libxls code), Christophe Leitienne [ctb, cph] (Author of included libxls code), Bob Colbert [ctb, cph] (Author of included libxls code), David Hoerl [ctb, cph] (Author of included libxls code), Evan Miller [ctb, cph] (Author of included libxls code)

Initial release

read_excel

Description

Usage

Arguments

Value

See Also

Examples

readxl

We don't support your browser anymore