Introduction to Data Frames
In analysis and statistics, tabular data is the most important data structure. It is present in many common formats like Excel files, comma separated values (CSV) or databases. R integrates tabular data objects as first-class citizens into the language through data frames. Data frames allow users to easily read and manipulate tabular data within the R language.
Let’s take a look at a data frame object named Davis
, from the package carData, which includs height and weight measurements for 200 men and women:
Davis
sex weight height repwt repht 1 M 77 182 77 180 2 F 58 161 51 159 3 F 53 161 54 158 [ reached 'max' / getOption("max.print") -- omitted 197 rows ]
From the printed output we can see that the data frame spans over 200 rows (3 printed, 197 omitted) and 5 columns. In the example above, each row contains data of one person through attributes, which correspond to the columns sex
, weight
, height
, reported weight repwt
and reported height repht
.
For example, the first row in the table specifies a M
ale weighing 77
kg and has a height of 182
cm. The reported weights are very close with 77
kg and 180
cm, respectively.
The rows in a data frame are further identified by row names on the left which are simply the row numbers by default. In the case of the Davis
dataset above the row names range from 1 to 200.