Previous chapter
WorkshopWorkshop
Next chapter

Agenda

Day 1

TimeDescription
08:00 - 09:00Introduction
09:00 - 10:30Setup of Teams and Development Environment, Small exercise
10:30 - 12:00Introduction to the Shiny Use-Case
13:00 - 14:00Sketching First UI Prototype, Feedback round
14:00 - 18:00Developing First Prototype
18:00END OF DAY (HARD LIMIT)

Day 2

TimeDescription
09:00 - 10:00Peer-feedback round
10:00 - 12:00Second Iteration of prototype
13:00 - 14:00Preparation presentation
14:00 - 15:30Final presentations

Project Description

The holiday season is around the corner and we are flooded with information for possible destinations. The goal is to create a Shiny app to extract useful information from differnt data sources to ultimately answer the question which holiday destination to choose.

We have pre-selected the following destinations:

  • Crete
  • Rome
  • Vienna
  • Lisbon
  • Mallorca

We would now like to present information for each of these destinations using different information sources:

  • AirBnB (for rooms, prices, ratings, etc.)
  • Google Places (nearby restaurants, museums, maps)
  • Statistics from OECD (PPP, Quality of living, demographics, etc.)
  • Weather Data (forecasts and historical, temperature, rain)
  • Twitter Data (analyzing Twitter stream for respective location, sentiment, important words, etc.)

Each team shall choose one information source and create a Shiny Dashboard page presenting useful information. The respective destination will serve as an input and can be changed interactively.

Project Template

  • Template has been created using golem
  • Settings applied using dev/01_start.R and dev/02_dev.R
  • Created five modules
  • Called by inst/app_server.R
  • Checkout the template from Github at https://github.com/Quantargo/travelboard
  • Each team should only work on one module - additional functions can be added to the package, as well.

See also https://rstudio.github.io/shinydashboard/structure.html for an overview of shinydashboard elements.

inst/app_server.R

app_server <- function(input, output, session) {
  output$ui <- renderUI({
    item <- input$menu
    switch(item, 
           airbnb = mod_airbnb_ui("airbnb_ui_1", input$dest),
           news = mod_news_ui("news_ui_1", input$dest),
           places = mod_places_ui("places_ui_1", input$dest),
           statistics = mod_statistics_ui("statistics_ui_1", input$dest),
           weather = mod_weather_ui("weather_ui_1", input$dest)
           )
  })

  # List the first level callModules here
  callModule(mod_airbnb_server, "airbnb_ui_1", reactive(input$dest))
  callModule(mod_news_server, "news_ui_1", reactive(input$dest))
  callModule(mod_places_server, "places_ui_1", reactive(input$dest))
  callModule(mod_statistics_server, "statistics_ui_1", reactive(input$dest))
  callModule(mod_weather_server, "weather_ui_1", reactive(input$dest))
}

inst/app_ui.R

app_ui <- function(destinations = c("Crete", "Lisbon", "Mallorca", "Rome", "Vienna")) {
  tagList(
    # Leave this function for adding external resources
    golem_add_external_resources(),
    # List the first level UI elements here 
    
    dashboardPage(
      dashboardHeader(title = "Travel Board"),
      dashboardSidebar(
        sidebarMenu(id = "menu",
                    selectInput("dest", "Destination", destinations),
                    menuItem("Airbnb", tabName = "airbnb", icon = icon("home"), selected = TRUE),
                    menuItem("Places", tabName = "places", icon = icon("map-marker-alt")),
                    menuItem("Statistics", tabName = "statistics", icon = icon("chart-bar")),
                    menuItem("Weather", tabName = "weather", icon = icon("sun")),
                    menuItem("News", tabName = "news", icon = icon("newspaper"))
        )
      ),
      dashboardBody(
        # Boxes need to be put in a row (or column)
        uiOutput("ui")
      )
    )
  )
}

AirBnb

  • You can also download the script here.
  • The script is also available in the workspace at ~/workshop/airbnb.R.
  • Datasets are available in the workspace at ~/workshop/data/airbnb/*.
# From http://insideairbnb.com/get-the-data.html
# Nice Geo-Example: https://shiny.rstudio.com/gallery/superzip-example.html

library(tidyverse)
library(leaflet)
library(leaflet.extras)
library(googleway)
library(rgdal)
# apt-get install libjq-dev libjpeg-dev libgdal-dev

dat <- read_rds("data/airbnb/crete.rds")
summary(dat)
dat_proc <- dat %>% 
  mutate(price = as.numeric(sub("$", "", price, fixed = TRUE)))

hist(dat_proc$price)

leaflet() %>% 
  addTiles() %>% 
  addMarkers(lng=dat_proc$longitude, 
             lat=dat_proc$latitude, popup=dat_proc$name, 
             clusterOptions = markerClusterOptions())

# Beware that the heatmap is NOT interpolated -> each point is plotted on its own
# -> regions with many listings will automatically appear "hotter"
# See also https://stackoverflow.com/questions/44749346/r-heatmap-stat-density2d-ggmap-vs-addheatmap-shiny-leaflet
# https://gis.stackexchange.com/questions/168886/r-how-to-build-heatmap-with-the-leaflet-package
dat_proc$dummy <- 1
leaflet() %>% 
  addTiles() %>% 
  addMarkers(lng=dat_proc$longitude, 
             lat=dat_proc$latitude, popup=dat_proc$name, 
             clusterOptions = markerClusterOptions()) %>%
  addHeatmap(lng = dat_proc$longitude, lat = dat_proc$latitude, intensity = dat_proc$price,
             blur = 20, max = 0.05, radius = 15)

## Alternative maps using googleway
# Note: Slow for more than 1000 elements, again, not interpolated "properly"
key <- "AIzaSyBCGvNSks4_NvBcAwdRLw9hXM0J0RkQhQg"
dat_proc_head <- head(dat_proc, 100)
data_proc_head <- dat_proc_head[!is.na(dat_proc_head$price), ]
google_map(data = dat_proc_head, key = key) %>%
  add_markers(lat = "latitude", lon = "longitude", info_window = "name", cluster = TRUE) %>%
  add_heatmap(lat = "latitude", lon = "longitude", weight = "price", option_radius = 0.15, legend = TRUE)

Google Places

  • You can also download the script here.
  • The script is also available in the workspace at ~/workshop/google_places.R.
  • Datasets are available in the workspace at ~/workshop/data/google_places/*.
# See also https://cran.r-project.org/web/packages/googleway/vignettes/googleway-vignette.html
library(googleway)
library(tidyr)

dat <- read_rds("data/google_places/crete.rds")
restaurants <- dat[["Restaurants"]]

restaurants$info <- paste0("<b>Restaurant Name: </b>", restaurants$name)
google_map(data = restaurants, key = key) %>%
  add_markers(lat = "lat", lon = "lng", info_window = "info") %>%
  add_heatmap(lat = "lat", lon = "lng", weight = "user_ratings_total", option_radius = 0.15, legend = TRUE)

bars <- dat[["Bars"]]
bars$info <- paste0("<b>Bar Name: </b>", bars$name)
google_map(data = bars, key = key) %>%
  add_markers(lat = "lat", lon = "lng", info_window = "info") %>%
  add_heatmap(lat = "lat", lon = "lng", weight = "user_ratings_total", option_radius = 0.15, legend = TRUE)

# Get spots based on geolocation
# res <- google_places(location = c(-37.918, 144.968),
#                      keyword = "Cinema",
#                      radius = 5000,
#                      key = key)

Statistics from OECD

  • You can also download the script here.
  • The script is also available in the workspace at ~/workshop/oecd.R.
  • Datasets are available in the workspace at ~/workshop/data/oecd/*.
# See also https://github.com/expersso/OECD
# - e.g. Price levels,
# - Unemployment, Youth Unemployment  
# - Corruption
# - Better Life Index (BLI)
#
# Better Life index: http://www.oecdbetterlifeindex.org/
# https://stats.oecd.org/

library(OECD)
library(tidyverse)
# https://en.wikipedia.org/wiki/List_of_ISO_3166_country_codes
countries <- c(Austria = "AUT", Greece = "GRC", Italy = "ITA", Spain = "ESP", Portugal = "PRT")
               #Switzerland = "CHE")

# Search at https://stats.oecd.org using box on the left 
# Alternatively, use search_dataset() function from the OECD package
dataset_list <- get_datasets()
search_dataset("Better Life", data = dataset_list) # BLI
search_dataset("Economic Outlook No 105", data = dataset_list) # EO (Forecasts for Unemployment, Growth, etc.)
search_dataset("Population", data = dataset_list) # RPOP (Population Statistics by Sex, Age)
search_dataset("Purchasing Power Parity", data = dataset_list) # PPPGDP (PPP, Exchange Rates)

bli <- readRDS(file = "data/oecd/bli.rds")
df <- bli %>% 
  filter(INEQUALITY == "TOT", LOCATION %in% countries)
ggplot(df) + geom_bar(aes(x = LOCATION, y = obsValue, fill = LOCATION), stat = "identity") + 
  facet_wrap(~INDICATOR, scales = "free_y")

# Plot Life Satisfaction only
# See https://stats.oecd.org for more descriptions
bli %>% 
  filter(INEQUALITY == "TOT", LOCATION %in% countries, INDICATOR == "SW_LIFS") %>%
ggplot() + geom_bar(aes(x = LOCATION, y = obsValue, fill = LOCATION), stat = "identity")

# Get PPP values, exchange rates, etc.
# Database inventory: http://www.oecd.org/sdd/na/44221974.pdf
# See also https://www.oecd.org/sdd/prices-ppp/purchasingpowerparities-frequentlyaskedquestionsfaqs.htm
pppgdp <- readRDS(file = "data/oecd/pppgdp.rds")
dat_filter <- pppgdp %>% 
  filter(LOCATION %in% countries, UNIT == "NATUSD")
ggplot(dat_filter) + 
  geom_line(aes(obsTime, obsValue, group = LOCATION, color = LOCATION))

# https://www.oecd.org/eco/outlook/EO104_Database_Inventory.pdf
eo <- readRDS(file = "data/oecd/eo.rds")
# Plot (projected) unemployment rate (UNR), 
eo %>%
  mutate(obsTime = as.integer(obsTime), obsValue = obsValue / 100) %>%
  filter(LOCATION %in% countries, VARIABLE == "UNR") %>%
  ggplot() + 
  geom_line(aes(obsTime, obsValue, group = LOCATION, color = LOCATION)) + 
  scale_y_continuous(labels = scales::percent)
  
# Plot (projected) GDP growth
eo %>%
  mutate(obsTime = as.integer(obsTime)) %>%
  filter(LOCATION %in% countries, VARIABLE == "GDPV_ANNPCT") %>%
  ggplot() + 
  geom_line(aes(obsTime, obsValue, group = LOCATION, color = LOCATION))

# Plot (projected) Population growth
eo %>%
  mutate(obsTime = as.integer(obsTime)) %>%
  filter(LOCATION %in% countries, VARIABLE == "POP") %>%
  group_by(LOCATION) %>%
  arrange(obsTime) %>%
  mutate(diff = obsValue - dplyr::lag(obsValue), 
         percent_diff = diff / dplyr::lag(obsValue)) %>%
  ggplot() + 
  geom_line(aes(obsTime, percent_diff, group = LOCATION, color = LOCATION))

Weather Data

  • You can also download the script here.
  • The script is also available in the workspace at ~/workshop/weather.R.
  • Datasets are available in the workspace at ~/workshop/data/weather/*.
library(owmr)
library(rnoaa)
library(leaflet)
#library(rwunderground) -> would be nice!

apikey <- "d7eae13fe954ea0e04b0c40a172c4a10"
owmr_settings(apikey)

# Current Temperature by City
cities <- c("Crete", "Rome", "Vienna", "Lisbon", "Palma De Mallorca")
res <- lapply(cities, function(x) {
  get_current(x, units = "metric") %>% owmr_as_tibble()
})
names(res) <- cities
sapply(res, function(x) x$temp_max)

# Get Forecast by City
fc <- lapply(cities, owmr::get_forecast)
names(fc) <- cities
# Forecast for Vienna
fc_vienna <- fc$Vienna$list
fc_vienna %>% mutate(dt = as.POSIXct(dt, origin = "1970-01-01")) %>% 
  select(dt, main.temp, main.temp_min, main.temp_max, clouds.all, wind.speed, rain.3h) %>%
  reshape2::melt(id.vars = "dt") %>%
  mutate(type = case_when(grepl("temp", variable) ~ "Temperature",
                          grepl("wind", variable) ~ "Wind",
                          grepl("clouds", variable) ~ "Clouds")) %>%
  ggplot() + 
    geom_line(aes(dt, value, color = variable, group = variable)) + 
    facet_wrap(~type, scales = "free_y", ncol = 1)

# Reverse geo-lookup
# library(googleway)
# key <- "AIzaSyBCGvNSks4_NvBcAwdRLw9hXM0J0RkQhQg"
# set_key(key = key)
# google_geocode(address = "Vienna, Austria") -> (16.37382, 48.20817)
# google_geocode(address = "Crete, Greece") -> (24.80927, 35.24012)
# google_geocode(address = "Rome, Italy") -> (12.49637, 41.90278)
# google_geocode(address = "Mallorca, Spain") -> (3.017571, 39.69526)
# google_geocode(address = "Lisbon, Portugal") -> (-9.139337, 38.72225)

## Temperature map
leaflet() %>%
  addTiles() %>%
  addProviderTiles(providers$OpenWeatherMap.Temperature,
                   options = providerTileOptions(apiKey=apikey)) %>%
  setView(24.80927, 35.24012, zoom = 9)

## Rain map
leaflet() %>%
  addTiles() %>%
  addProviderTiles(providers$OpenWeatherMap.Rain,
                   options = providerTileOptions(apiKey=apikey)) %>%
  setView(24.80927, 35.24012, zoom = 9)

## Wind map
leaflet() %>%
  addTiles() %>%
  addProviderTiles(providers$OpenWeatherMap.Wind,
                   options = providerTileOptions(apiKey=apikey)) %>%
  setView(24.80927, 35.24012, zoom = 9)

## Pressure map
leaflet() %>%
  addTiles() %>%
  addProviderTiles(providers$OpenWeatherMap.Pressure,
                   options = providerTileOptions(apiKey=apikey)) %>%
  setView(24.80927, 35.24012, zoom = 9)

## Clouds map
leaflet() %>%
  addTiles() %>%
  addProviderTiles(providers$OpenWeatherMap.Clouds,
                   options = providerTileOptions(apiKey=apikey)) %>%
  setView(24.80927, 35.24012, zoom = 9)

## Historical Weather Data
# PRCP = Precipitation (tenths of mm)
# SNOW = Snowfall (mm)
# SNWD = Snow depth (mm)
# TMAX = Maximum temperature (tenths of degrees C)
# TMIN = Minimum temperature (tenths of degrees C)

dat <- read_rds("data/weather/crete.rds")

dat %>% 
  mutate(date = as.Date(date)) %>% 
  ggplot() + 
  geom_line(aes(date, value, color = datatype, group = datatype)) +  
  facet_wrap(~datatype, scales = "free_y", ncol = 1)

# More recent -> last year
dat %>% 
  mutate(date = as.Date(date)) %>% 
  filter(date >= as.Date("2018-01-01")) %>% 
  ggplot() + 
  geom_line(aes(date, value, color = datatype, group = datatype)) + 
  facet_wrap(~datatype, scales = "free_y", ncol = 1)

Twitter/News Data

  • You can also download the script here.
  • The script is also available in the workspace at ~/workshop/twitter_news.R.
  • Datasets are available in the workspace at ~/workshop/data/twitter/*.
# Script to analyze holiday destinations via twitter feeds
# See also https://rtweet.info/ for inspiration
library(rtweet)
library(tidyverse)
library(tidytext)
library(wordcloud2)
library(tm)

## Analyzing Twitter for Crete

rt <- read_rds("data/twitter/crete.rds")

## plot time series of tweets
rt %>%
  ts_plot("1 day") +
  theme_minimal() +
  theme(plot.title = element_text(face = "bold")) +
  labs(
    x = NULL, y = NULL,
    title = "Frequency of #crete Twitter statuses from past 9 days",
    subtitle = "Twitter status (tweet) counts aggregated using three-hour intervals",
    caption = "\nSource: Data collected from Twitter's REST API via rtweet"
  )

# Who posted the most?
rt %>% 
  group_by(screen_name) %>%
  summarise(count = n()) %>%
  arrange(-count)

## Most important words
tweet_words <- rt %>% select(screen_name, created_at, text) %>% 
  unnest_tokens(word, text) %>%
  count(word, sort=T)
my_stop_words <- stop_words %>% select(-lexicon) %>% 
  bind_rows(data.frame(word = c("https", "t.co", "rt", "amp","4yig9gzh5t","fyy2ceydhi","78","fakenews")))
tweet_words_interesting <- tweet_words %>% anti_join(my_stop_words, by = "word")

top_25 <- tweet_words_interesting %>% 
  group_by(word) %>% 
  tally(sort=TRUE) %>% 
  slice(1:25)

top_25 %>% mutate(word = reorder(word, n, function(n) -n)) %>%
  ggplot() + geom_bar(aes(word, n), stat = "identity") + 
  theme(axis.text.x = element_text(angle = 60, hjust = 1)) + 
  xlab("")

## Calc sentiment
bing_lex <- get_sentiments("bing")

sentiment_words <- rt %>% select(status_id, screen_name, created_at, text) %>% 
  unnest_tokens(word, text) %>% 
  inner_join(bing_lex, by = "word") 

sentiment_words %>% 
  mutate(created_at_day = lubridate::as_date(lubridate::round_date(created_at, "day")),
         sentiment_num = ifelse(sentiment == "positive", 1, -1), 
         count = n()) %>%
  ggplot() + 
  geom_bar(aes(created_at_day, fill = sentiment), stat = "count") + 
  facet_wrap(~sentiment, ncol = 1)


## Create word cloud
wordcloud2(top_25, color="random-light", size = .6, shuffle=T, rotateRatio = sample(c(1:100) / 100))


## TODO: Analyzing NYTimes (as with Twitter data)
rt_nytimes <- read_rds("data/twitter/crete_nytimes.rds")
View(rt_nytimes$data)