## Introduction

Classification trees are partitioned recursively and splits are evaluated based on either the Gini index or *entropy*.

The **Gini index** is defined by \[G = \sum_{k=1}^K \hat{p}_{mk}(1 − \hat{p}_{mk})\] a measure of total variance across the K classes. Here \(\hat{p}_{mk}\) represents the proportion of training observations in the \(m^{th}\) region that are from the \(k^{th}\) class

It is not hard to see that the Gini index takes on a small value if all of the \(p_{mk}\)’s are close to zero or one. For this reason the Gini index is referred to as a measure of node *purity*—a small value indicates that a node contains predominantly observations from a single class.

An alternative to the Gini index is **entropy**, given by

\[ D = − \sum_{k=1}^K \hat{p}_{mk}\ log\ \hat{p}_{mk}\]

Since \(0 ≤\hat{p}_{mk} ≤ 1\), it follows that \(0 ≤ − \hat{p}_{mk}\ log\ \hat{p}_{mk}\). One can show that the entropy will take on a value near zero if the \(\hat{p}_{mk}\)’s are all near zero or near one. Therefore, like the Gini index, the entropy will take on a small value if the mth node is pure. In fact, it turns out that the Gini index and the entropy are quite similar numerically.

## Fitting Classification Trees

We now fit a data set based on children’s carseats sales data. We will attempt to predict **Sales** (child car seat sales) in 400 locations based on a number of predictors. We first transform the **Sales** numeric variable into a factor indicating whether sales where high.

We see that the training error rate is 9 %. For classification trees, the deviance reported in the output of `summary()`

is given by

\[-2 \sum_{m} \sum_{k} n_{mk}\ log\ \hat{p}_{mk}\]

## Plotting Classification Trees

Now, let’s plot the tree stored in `tree.carseats`

:

Using `rpart()`

instead of `tree()`

and the package `rpart.plot`

we can get a nicer looking tree output:

## Fitting Classification Tree using a Test Set

Instead of reporting the in-sample performance with an accuracy of 91% we should now estimate the out-of-sample performance:

## Fitting a Classification Tree using Pruning and Cross Validation

Next, we consider whether pruning the tree might lead to improved results. We use the argument `FUN=prune.misclass`

in order to indicate that we want the classification error rate to guide the cross-validation and pruning process, rather than the default for the `cv.tree()`

function, which is **deviance**.

## Apply Pruning

We now apply the prune.misclass() function in order to prune the tree to prune. obtain the nine-node tree.

## Evaluate Pruned Tree

How well does this pruned tree perform on the test data set? Once again, we apply the predict() function.

## Increase Number of `best`

Let’s increase the value of best to obtain a larger pruned tree - did we increase theclassification accuracy?