DATA is curated courtesy of Sara Stoudt and comes from the recently created The Economist Data GitHub!
Their November 24th article on TV ratings covers ‘all TV dramas … via IMDb from 1990 to 2018’.
Data dictionary courtesy of
kable, with credit to Phillip Knor for the pull-request.
1. Data Sourcing/Exploration
Step 0: Read Data
Read data from IMDB Economist TV Ratings
dat <- read_csv("data/IMDb_Economist_tv_ratings.csv")
1. Data Exploration
First, let’s find answers for following questions:
- What is the series with the longest duration (by date)? Hint: You can use
diff(range(date)) to calculate duration.
- Which series has the highest/lowest average rating?
- Which series had most number of episodes?
2. Ratings for Selected Series
We would now like to plot and compare movie ratings for the following series in the dataset:
series <- c("Star Trek", "Breaking Bad", "Game of Thrones", "Sopranos", "Big Bang")
filter to reduce the dataset to only the series above. You can use
str_detect() to detect strings in series. If you want to detect multiple series at once a pattern can be constructed as follows:
paste(series, collapse = "|").
- Plot the average rating
av_rating for each series over
date. You can color by title and set size by
share as in the plot below.
- What other series could be plotted and would be interesting?
3. Ratings by Genre
Plot the average movie ratings by date. Which genre seems to be most/least popular?
4. Forecast Number of Ratings
We would now like to examine the overall pattern of movie ratings over time and forecast the number of movie ratings for the next 12 months.
- First, we would like to aggregate the number of ratings over time. To aggreagate data monthly we can use
group_by and round
ceiling_date to group data monthly. Subsequently we
summarise() and count observations using
- Filter data from
- Convert data.frame to a
- Use your favorite forecasting technique, e.g. Arima with
auto.arima() or Neural Networks with
netar() to forecast the time series.
How to the different forecasting techniques compare? What (seasonal) patterns do you see in the time series over time?