Introduction to Data Frames

In analysis and statistics, tabular data is the most important data structure. It is present in many common formats like Excel files, comma separated values (CSV) or databases. R integrates tabular data objects as first-class citizens into the language through data frames. Data frames allow users to easily read and manipulate tabular data within the R language.

Let’s take a look at a data frame object named Davis, from the package carData, which includs height and weight measurements for 200 men and women:

Input
Davis
Output
  sex weight height repwt repht
1   M     77    182    77   180
2   F     58    161    51   159
3   F     53    161    54   158
 [ reached 'max' / getOption("max.print") -- omitted 197 rows ]

From the printed output we can see that the data frame spans over 200 rows (3 printed, 197 omitted) and 5 columns. In the example above, each row contains data of one person through attributes, which correspond to the columns sex, weight, height, reported weight repwt and reported height repht.

For example, the first row in the table specifies a Male weighing 77kg and has a height of 182cm. The reported weights are very close with 77kg and 180cm, respectively.

The rows in a data frame are further identified by row names on the left which are simply the row numbers by default. In the case of the Davis dataset above the row names range from 1 to 200.

Build a data frame from vectors