Filter features
Similar to using tCorpus$subset, but instead of deleting rows it only sets rows for a specified feature to NA. This can be very convenient, because it enables only a selection of features to be used in an analysis (e.g. a topic model) but maintaining the context of the full article, so that results can be viewed in this context (e.g. a topic browser).
Just as in subset, it is easy to use objects and functions in the filter, including the special functions for using term frequency statistics (see documentation for tCorpus$subset).
Usage:
## R6 method for class tCorpus. Use as tc$method (where tc is a tCorpus object).
feature_subset(column, new_column, subset)
column |
the column containing the feature to be used as the input |
subset |
logical expression indicating rows to keep in the tokens data. i.e. rows for which the logical expression is FALSE will be set to NA. |
new_column |
the column to save the filtered feature. Can be a new column or overwrite an existing one. |
min_freq |
an integer, specifying minimum token frequency. |
min_docfreq |
an integer, specifying minimum document frequency. |
max_freq |
an integer, specifying minimum token frequency. |
max_docfreq |
an integer, specifying minimum document frequency. |
min_char |
an integer, specifying minimum characters in a token |
max_char |
an integer, specifying maximum characters in a token |
tc = create_tcorpus('a a a a b b b c c') tc$feature_subset('token', 'tokens_subset1', subset = token_id < 5) tc$feature_subset('token', 'tokens_subset2', subset = freq_filter(token, min = 3)) tc$tokens
Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.