udpipe: brussels_reviews_anno – R documentation

Become an expert in R — Interactive courses, Cheat Sheets, certificates and more!

brussels_reviews_anno

Reviews of the AirBnB customers which are tokenised, POS tagged and lemmatised

Description

Reviews of the AirBnB customerswhich are tokenised, POS tagged and lemmatised. The data contains 1 row per document/token and contains the fields doc_id, language, sentence_id, token_id, token, lemma, xpos.
Data has been converted from UTF-8 to ASCII as in iconv(x, from = "UTF-8", to = "ASCII//TRANSLIT") in order to be able to comply to CRAN policies.

Source

http://data.insideairbnb.com/belgium/bru/brussels/2015-10-03/visualisations/reviews.csv

Examples

## brussels_reviews_anno
data(brussels_reviews_anno)
head(brussels_reviews_anno)
sort(table(brussels_reviews_anno$xpos))

## Not run: 

##
## If you want to construct a similar dataset as the 
## brussels_reviews_anno dataset based on the udpipe library, do as follows
##

library(udpipe)
library(data.table)
data(brussels_reviews)

## The brussels_reviews contains comments on Airbnb sites in 3 languages: es, fr and nl
table(brussels_reviews$language)
bxl_anno <- split(brussels_reviews, brussels_reviews$language)

## Annotate the Spanish comments
m <- udpipe_download_model(language = "spanish-ancora")
m <- udpipe_load_model(file = m$file_model)
bxl_anno$es <- udpipe_annotate(object = m, x = bxl_anno$es$feedback, doc_id = bxl_anno$es$id)

## Annotate the French comments
m <- udpipe_download_model(language = "french-partut")
m <- udpipe_load_model(file = m$file_model)
bxl_anno$fr <- udpipe_annotate(object = m, x = bxl_anno$fr$feedback, doc_id = bxl_anno$fr$id)

## Annotate the Dutch comments
m <- udpipe_download_model(language = "dutch-lassysmall")
m <- udpipe_load_model(file = m$file_model)
bxl_anno$nl <- udpipe_annotate(object = m, x = bxl_anno$nl$feedback, doc_id = bxl_anno$nl$id)

brussels_reviews_anno <- lapply(bxl_anno, as.data.frame)
brussels_reviews_anno <- rbindlist(brussels_reviews_anno)
str(brussels_reviews_anno)

## End(Not run)

udpipe

Tokenization, Parts of Speech Tagging, Lemmatization and Dependency Parsing with the 'UDPipe' 'NLP' Toolkit

v0.8.5

MPL-2.0

Authors

Jan Wijffels [aut, cre, cph], BNOSAC [cph], Institute of Formal and Applied Linguistics, Faculty of Mathematics and Physics, Charles University in Prague, Czech Republic [cph], Milan Straka [ctb, cph], Jana Straková [ctb, cph]

Initial release

brussels_reviews_anno

Description

Source

See Also

Examples

udpipe

We don't support your browser anymore