Group near elements of string vectors
This function groups elements of a string vector (character or string variable) according to the element's distance ('similatiry'). The more similar two string elements are, the higher is the chance to be combined into a group.
group_str( strings, precision = 2, strict = FALSE, trim.whitespace = TRUE, remove.empty = TRUE, verbose = FALSE, maxdist )
strings |
Character vector with string elements. |
precision |
Maximum distance ("precision") between two string elements, which is allowed to treat them as similar or equal. Smaller values mean less tolerance in matching. |
strict |
Logical; if |
trim.whitespace |
Logical; if |
remove.empty |
Logical; if |
verbose |
Logical; if |
maxdist |
Deprecated. Please use |
A character vector where similar string elements (values) are recoded
into a new, single value. The return value is of same length as
strings
, i.e. grouped elements appear multiple times, so
the count for each grouped string is still avaiable (see 'Examples').
oldstring <- c("Hello", "Helo", "Hole", "Apple", "Ape", "New", "Old", "System", "Systemic") newstring <- group_str(oldstring) # see result newstring # count for each groups table(newstring) # print table to compare original and grouped string frq(oldstring) frq(newstring) # larger groups newstring <- group_str(oldstring, precision = 3) frq(oldstring) frq(newstring) # be more strict with matching pairs newstring <- group_str(oldstring, precision = 3, strict = TRUE) frq(oldstring) frq(newstring)
Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.