Previous chapter
Case StudiesIMDB Ratings: TV's golden age is real
Next chapter

## Data Description

DATA is curated courtesy of Sara Stoudt and comes from the recently created The Economist Data GitHub!

Their November 24th article on TV ratings covers ‘all TV dramas … via IMDb from 1990 to 2018’.

### Data Dictionary

Data dictionary courtesy of `skimr` and `kable`, with credit to Phillip Knor for the pull-request.

typevariablemissingcompletenminmax
charactergenres022662266525
charactertitle022662266151
charactertitleId02266226699
Datedate0226622661990-01-032018-10-10
integerseasonNumber022662266NANA
numericav_rating022662266NANA
numericshare022662266NANA

## 1. Data Sourcing/Exploration

Read data from IMDB Economist TV Ratings

``dat <- read_csv("data/IMDb_Economist_tv_ratings.csv")``

### 1. Data Exploration

First, let’s find answers for following questions:

1. What is the series with the longest duration (by date)? Hint: You can use `diff(range(date))` to calculate duration.
2. Which series has the highest/lowest average rating?
3. Which series had most number of episodes?

## 2. Ratings for Selected Series

We would now like to plot and compare movie ratings for the following series in the dataset:

``series <- c("Star Trek", "Breaking Bad", "Game of Thrones", "Sopranos", "Big Bang")``
1. Use `filter` to reduce the dataset to only the series above. You can use `str_detect()` to detect strings in series. If you want to detect multiple series at once a pattern can be constructed as follows: `paste(series, collapse = "|")`.
2. Plot the average rating `av_rating` for each series over `date`. You can color by title and set size by `share` as in the plot below.
3. What other series could be plotted and would be interesting?

## 3. Ratings by Genre

Plot the average movie ratings by date. Which genre seems to be most/least popular?

## 4. Forecast Number of Ratings

We would now like to examine the overall pattern of movie ratings over time and forecast the number of movie ratings for the next 12 months.

1. First, we would like to aggregate the number of ratings over time. To aggreagate data monthly we can use `group_by` and round `date` to `ceiling_date` to group data monthly. Subsequently we `summarise()` and count observations using `n()`.
2. Filter data from `2008-01-01` onwards.
3. Convert data.frame to a `ts()` object
4. Use your favorite forecasting technique, e.g. Arima with `auto.arima()` or Neural Networks with `netar()` to forecast the time series.

How to the different forecasting techniques compare? What (seasonal) patterns do you see in the time series over time?