Subset tCorpus token data using a query
A convenience function that searches for contexts (documents, sentences), and uses the results to subset the tCorpus token data.
See the documentation for search_contexts for an explanation of the query language.
Usage:
## R6 method for class tCorpus. Use as tc$method (where tc is a tCorpus object).
subset_query(query, feature = 'token', context_level = c('document','sentence','window'))
query |
A character string that is a query. See search_contexts for query syntax. |
feature |
The name of the feature columns on which the query is used. |
context_level |
Select whether the query and subset are performed at the document or sentence level. |
window |
If used, uses a word distance as the context (overrides context_level) |
as_ascii |
if TRUE, perform search in ascii. |
not |
If TRUE, perform a NOT search. Return the articles/sentences for which the query is not found. |
copy |
If TRUE, return modified copy of data instead of subsetting the input tcorpus by reference. |
text = c('A B C', 'D E F. G H I', 'A D', 'GGG') tc = create_tcorpus(text, doc_id = c('a','b','c','d'), split_sentences = TRUE) ## subset by reference tc$subset_query('A') tc$meta ## using copy mechanic class(tc$tokens$doc_id) tc2 = tc$subset_query('A AND D', copy=TRUE) tc2$get_meta() tc$meta ## (unchanged)
Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.