Detect/Locate Potential Non-Normalized Text
Detect/Locate potential issues with text data. This family of functions generates a list of detections/location functions that can be accessed via the dollar sign or square bracket operators. Accessible functions include:
which_are() is_it()
Contains contractions
Contains dates
Contains digits
Contains email addresses
Contains emoticons
Contains just white space
Contains escaped backslash character
Contains Twitter style hash tags
Contains html mark-up
Contains incomplete sentences (e.g., ends with ...)
Contains kerning (e.g. "The B O M B!")
Is a list of atomic vectors (Not provided by which_are
))
Contains potentially misspelled words
Contains a sentence with no ending punctuation
Contains commas with no space after them
Contains non-ASCII characters
Is a non-character vector (Not provided by which_are
))
Contains non split sentences
Contains a Twitter style handle used to tag others (use of the at symbol)
Contains a time stamp
Contains a URL
The functions above that have a description starting with 'is' rather than 'contains'
are meta functions that describe the attribute of the column/vector being passed
rather than attributes about the individual elements of the column/vector. The
meta functions will return a logical of length one and are not available under
which_are
.
which_are
returns an environment of functions that can be used to
locate and return the integer locations of the particular non-normalized text
named by the function.
is_it
returns an environment of functions that can be used to
detect and return a logical atomic vector of equal length to the input vector
(except for meta functions) of the particular non-normalized text
named by the function.
wa <- which_are() it <- is_it() wa$digit(c('The dog', "I like 2", NA)) it$digit(c('The dog', "I like 2", NA)) is_it()$list_column(c('the dog', 'ate the chicken'))
Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.