Match quanteda objects against token types
Developer function to match patterns in quanteda objects against token types.
object2id( x, types, valuetype = c("glob", "fixed", "regex"), case_insensitive = TRUE, concatenator = "_", levels = 1, remove_unigram = FALSE, keep_nomatch = FALSE ) object2fixed( x, types, valuetype = c("glob", "fixed", "regex"), case_insensitive = TRUE, concatenator = "_", levels = 1, remove_unigram = FALSE, keep_nomatch = FALSE )
x |
a list of character vectors, dictionary or collocations object |
types |
token types against which patterns are matched |
valuetype |
the type of pattern matching: |
case_insensitive |
logical; if |
concatenator |
the concatenation character that join multi-word
expression in |
levels |
integers specifying the levels of entries in a hierarchical
dictionary that will be applied. The top level is 1, and subsequent levels
describe lower nesting levels. Values may be combined, even if these
levels are not contiguous, e.g. |
remove_unigram |
if |
keep_nomatch |
keep patterns that did not match |
a list of integer vectors containing indices of matched types
types <- c("A", "AA", "B", "BB", "B_B", "C", "C-C") # dictionary dict <- dictionary(list(A = c("a", "aa"), B = c("BB", "B B"), C = c("C", "C-C"))) object2fixed(dict, types) object2fixed(dict, types, remove_unigram = TRUE) # phrase pats <- phrase(c("a", "aa", "zz", "bb", "b b")) object2fixed(pats, types) object2fixed(pats, types, keep_nomatch = TRUE)
Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.