Titanic data with passenger names and other details removed.
Titanic data with passenger names and other details removed.
A data frame with 1046 observations on 6 variables.
pclass |
passenger class, unordered factor: 1st 2nd 3rd |
survived |
factor: died or survived |
sex |
unordered factor: male female |
age |
age in years, min 0.167 max 80.0 |
sibsp |
number of siblings or spouses aboard, integer: 0...8 |
parch |
number of parents or children aboard, integer: 0...6 |
The dataset was compiled by Frank Harrell and Robert Dawson:
http://biostat.mc.vanderbilt.edu/wiki/pub/Main/DataSets/titanic.html.
For this version of the Titanic data, passenger details were deleted,
survived was cast as a factor,
and the name changed to ptitanic to minimize confusion with
other versions.
In this data the crew are conspicuous by their absence.
Contents of ptitanic:
pclass survived sex age sibsp parch
1 1st survived female 29.000 0 0
2 1st survived male 0.917 1 2
3 1st died female 2.000 1 2
4 1st died male 30.000 1 2
5 1st died female 25.000 1 2
...
1309 3rd died male 29.000 0 0How ptitanic was built:
load("titanic3.sav") # from Dr. Harrell's web site
# discard name, ticket, fare, cabin, embarked, body, home.dest
ptitanic <- titanic3[,c(1,2,4,5,6,7)]
# change survived from integer to factor
ptitanic$survived <- factor(ptitanic$survived, labels = c("died", "survived"))
save(ptitanic, file = "ptitanic.rda")data(ptitanic) summary(ptitanic) # survival rate was greater for females rpart.rules(rpart(survived ~ sex, data = ptitanic)) # survival rate was greater for higher classes rpart.rules(rpart(survived ~ pclass, data = ptitanic)) # survival rate was greater for children rpart.rules(rpart(survived ~ age, data = ptitanic)) # main indicator of missing data is 3rd class esp. with many children obs.with.nas <- rowSums(is.na(ptitanic)) > 0 rpart.rules(rpart(obs.with.nas ~ ., data = ptitanic, method = "class"))
Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.