Unicode Pattern Operators
Manipulate and combine Unicode Properties.
unicode_inverse(x, char_class = TRUE) unicode_union(..., char_class = TRUE) unicode_intersect(x, y, char_class = TRUE) unicode_setdiff(x, y, char_class = TRUE)
x |
A character vector containing Unicode General Category or Unicode
Properties. Use the functional forms ( |
char_class |
|
... |
Character vectors containing Unicode General Category or Unicode
Properties. Use the functional forms ( |
y |
A character vector containing Unicode General Category or Unicode
Properties. Use the functional forms ( |
Use these with ICU-based regular expression engines (stringi
and
stringr
).
# POSIX [:punct:] is more or less equivalent to the union of # Unicode punctuation and symbol general categories unicode_union(ugc_punctuation(), ugc_symbol()) # Everything except "A" to "Z" (including punctuation, control chars etc.) unicode_inverse("[A-Z]") # Uppercase letters, except "A" to "Z" unicode_setdiff(ugc_uppercase_letter(), "[A-Z]") # "A" to "F" (in upper or lower case) unicode_intersect(ugc_letter(), up_ascii_hex_digit()) # Usage x <- c(letters, LETTERS) rx <- unicode_intersect(ugc_letter(), up_ascii_hex_digit()) stringi::stri_extract_first_regex(x, rx)
Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.