In analysis and statistics, tabular data is the most important data structure. It is present in many common formats like Excel files, comma separated values (CSV) or databases. R integrates tabular data objects as first-class citizens into the language through data frames. Data frames allow users to easily read and manipulate tabular data within the R language.
Let’s take a look at a data frame object named
Davis, from the package carData, which includs height and weight measurements for 200 men and women:
sex weight height repwt repht 1 M 77 182 77 180 2 F 58 161 51 159 3 F 53 161 54 158 [ reached 'max' / getOption("max.print") -- omitted 197 rows ]
From the printed output we can see that the data frame spans over 200 rows (3 printed, 197 omitted) and 5 columns. In the example above, each row contains data of one person through attributes, which correspond to the columns
height, reported weight
repwt and reported height
For example, the first row in the table specifies a
77kg and has a height of
182cm. The reported weights are very close with
The rows in a data frame are further identified by row names on the left which are simply the row numbers by default. In the case of the
Davis dataset above the row names range from 1 to 200.