Merge the token and meta data.tables of a tCorpus with another data.frame
Add columns to token/meta by merging with a data.frame df. Only possible for unique matches (i.e. the columns specified in by are unique in df)
df |
A data.frame (can be regular, data.table or tibble) |
by |
The columns to match on. Must exist in both tokens/meta and df. If the columns in tokens/meta and df have different names, use by.x and by.y |
by.x |
The names of the columns used in tokens/meta |
by.y |
The names of the columns used in df |
columns |
Optionally, specify which specific columns from df to merge to tokens |
Usage:
## R6 method for class tCorpus. Use as tc$method (where tc is a tCorpus object).
merge(df, by, by.x, by.y)
merge_meta(df, by, by.x, by.y)
d = data.frame(text = c('This is an example. Best example ever.', 'oh my god', 'so good'), id = c('a','b','c'), source =c('aa','bb','cc')) tc = create_tcorpus(d, doc_col='id', split_sentences = TRUE) df = data.frame(doc_id=c('a','b'), test=c('A','B')) tc$merge(df, by='doc_id') tc$tokens df = data.frame(doc_id=c('a','b'), sentence=1, test2=c('A','B')) tc$merge(df, by=c('doc_id', 'sentence')) tc$tokens df = data.frame(doc_id=c('a','b'), sentence=1, token_id=c(3,4), test3=c('A','B')) tc$merge(df, by=c('doc_id', 'sentence', 'token_id')) tc$tokens meta = data.frame(doc_id=c('a','b'), test=c('A','B')) tc$merge_meta(meta, by='doc_id') tc$meta meta = data.frame(source=c('aa'), test2=c('A')) tc$merge_meta(meta, by='source') tc$meta
Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.