Previous chapter
Read Excel FilesIntroduction
Next chapter

Data Sources

I think, therefore I R. William B. King (in his R tutorials)

R is known to be very flexible importing massive amounts of data from numerous different data sources and formats. The most popular data sources and formats are

  • Text Files: Files including text (ASCII), most commonly tabular
  • Excel Files: Microsoft Excel files (.xls, .xlsx)
  • Databases: Oracle, PostgreSQL SQLite, etc.
  • Internet: R as a Web Client
  • Other: XML, JSON, etc.

R and Excel: History

R has a rather awkward relationship with MS Excel throughout its history. Due to its open-source background the “best practice” approach interacting wich Excel files has long been to export the contents to comma-separated files (CSV). However, this approach requires manual interventions and is far from practical. The RODBC package also allows users to create ODBC connection to MS Excel files.

The complication with MS Excel files is not only their proprietary nature. Although the curent Excel data format is based on open technologies like XML (an Excel file is basically a compressed XML file) it comes with a lot of special cases attached.

Various R packages are able to interact with Excel files. Most of them use external libraries in C++, Java, etc. Especially Java based dependencies can be more difficult to install on specific platforms like in the case of the XLConnect.

A completely different category represents the RExcel package which allows Excel to connect to an R instance through a (DCOM) interface.

Excel Packages

There are numerous to interact with Excel from R:

Package / MethodDescription
read.table("clipboard")Transfer through clipboard
readxlRead excel files without external dependencies
writexlWrite xlsx files without external dependencies
XLConnect, xlsx (a)Read/Write content through Java library
gdata (b)Read contents through CSV
RODBCRead data through ODBC
RExcel (c)2-way transfer through (D)COM Server
openxlsxRead, Write and Edit XLSX Files

-(a): Uses on Apache POI Java library - (b): Uses perl bindings to read/write Excel files - (c): Requires external software installation / admin rights

Selected Excel Packages and Methods

  • readxl: Read-only access to Excel files
    • PRO: Fast, no external language dependencies required
    • CON: Read-only
  • writexl: Write data to xlsx files
    • PRO: Very Fast, no external language dependencies required
    • CON: Write-only
  • openxlsx: Write data to xlsx files
    • PRO: Very Fast, no external language dependencies required
    • CON: Write-only
  • read.table("clipboard")
    • PRO: Quick data import for analysis
    • CON: Not reproducible/efficient

read.table("clipboard")

  1. Select area of interest in an Excel worksheet
  2. Copy to clipboard (STRG-C)
  3. Run read.table("clipboard", sep = "\t", ...) in R

Question

Try to read the ‘mtcars’ sheet from data-sources-04.xlsx into an R data.frame using STRG-C and read.table(). Compare result using all.equal() with mtcars.

  1. Select entire area in Excel Sheet
  2. Copy area into clipboard using Edit->Copy or STRG-C
  3. Paste Contents using fromclip <- read.table(file = "clipboard", sep = "\t")
  4. Compare using all.equal(fromclip, mtcars)

Bonus Question

Show how to read the same sheet using readxl by clicking on the file (Import) in RStudio.