Unicode General Categories
Match a Unicode General Category.
ugc_cased_letter(lo, hi, char_class = TRUE) ugc_close_punctuation(lo, hi, char_class = TRUE) ugc_connector_punctuation(lo, hi, char_class = TRUE) ugc_control(lo, hi, char_class = TRUE) ugc_currency_symbol(lo, hi, char_class = TRUE) ugc_dash_punctuation(lo, hi, char_class = TRUE) ugc_decimal_number(lo, hi, char_class = TRUE) ugc_enclosing_mark(lo, hi, char_class = TRUE) ugc_final_punctuation(lo, hi, char_class = TRUE) ugc_format_control(lo, hi, char_class = TRUE) ugc_initial_punctuation(lo, hi, char_class = TRUE) ugc_letter(lo, hi, char_class = TRUE) ugc_letter_number(lo, hi, char_class = TRUE) ugc_line_separator(lo, hi, char_class = TRUE) ugc_lowercase_letter(lo, hi, char_class = TRUE) ugc_mark(lo, hi, char_class = TRUE) ugc_math_symbol(lo, hi, char_class = TRUE) ugc_modifier_letter(lo, hi, char_class = TRUE) ugc_modifier_symbol(lo, hi, char_class = TRUE) ugc_nonspacing_mark(lo, hi, char_class = TRUE) ugc_number(lo, hi, char_class = TRUE) ugc_open_punctuation(lo, hi, char_class = TRUE) ugc_other(lo, hi, char_class = TRUE) ugc_other_letter(lo, hi, char_class = TRUE) ugc_other_number(lo, hi, char_class = TRUE) ugc_other_punctuation(lo, hi, char_class = TRUE) ugc_other_symbol(lo, hi, char_class = TRUE) ugc_paragraph_separator(lo, hi, char_class = TRUE) ugc_private_use_control(lo, hi, char_class = TRUE) ugc_punctuation(lo, hi, char_class = TRUE) ugc_separator(lo, hi, char_class = TRUE) ugc_space_separator(lo, hi, char_class = TRUE) ugc_spacing_mark(lo, hi, char_class = TRUE) ugc_surrogate_control(lo, hi, char_class = TRUE) ugc_symbol(lo, hi, char_class = TRUE) ugc_titlecase_letter(lo, hi, char_class = TRUE) ugc_unassigned_control(lo, hi, char_class = TRUE) ugc_uppercase_letter(lo, hi, char_class = TRUE) UGC_UPPERCASE_LETTER UGC_LOWERCASE_LETTER UGC_TITLECASE_LETTER UGC_CASED_LETTER UGC_MODIFIER_LETTER UGC_OTHER_LETTER UGC_LETTER UGC_NONSPACING_MARK UGC_SPACING_MARK UGC_ENCLOSING_MARK UGC_MARK UGC_DECIMAL_NUMBER UGC_LETTER_NUMBER UGC_OTHER_NUMBER UGC_NUMBER UGC_CONNECTOR_PUNCTUATION UGC_DASH_PUNCTUATION UGC_OPEN_PUNCTUATION UGC_CLOSE_PUNCTUATION UGC_INITIAL_PUNCTUATION UGC_FINAL_PUNCTUATION UGC_OTHER_PUNCTUATION UGC_PUNCTUATION UGC_MATH_SYMBOL UGC_CURRENCY_SYMBOL UGC_MODIFIER_SYMBOL UGC_OTHER_SYMBOL UGC_SYMBOL UGC_SPACE_SEPARATOR UGC_LINE_SEPARATOR UGC_PARAGRAPH_SEPARATOR UGC_SEPARATOR UGC_CONTROL UGC_FORMAT_CONTROL UGC_SURROGATE_CONTROL UGC_PRIVATE_USE_CONTROL UGC_UNASSIGNED_CONTROL UGC_OTHER
lo |
A non-negative integer. Minimum number of repeats, when grouped. |
hi |
positive integer. Maximum number of repeats, when grouped. |
char_class |
|
An object of class regex
(inherits from character
) of length 1.
A character vector representing part or all of a regular expression.
Table 12 of the Unicode Standard Annex #44 defines the Unicode General Categories. http://www.unicode.org/reports/tr44
You can see which characters are contained in a category by visiting, e.g., http://www.fileformat.info/info/unicode/category/Nd/list.htm
# Classes ugc_lowercase_letter() ugc_decimal_number() ugc_paragraph_separator() ugc_currency_symbol() # With repetition ugc_nonspacing_mark(3, 6) ugc_separator(1, Inf) ugc_dash_punctuation(0, Inf) # Without a class wrapper ugc_titlecase_letter(char_class = FALSE) # Constants UGC_UPPERCASE_LETTER UGC_LETTER_NUMBER UGC_MATH_SYMBOL UGC_FORMAT_CONTROL ## Not run: # All the Unicode general categories. # Not run, since it generates lots of output ls("package:rebus.unicode", pattern = "^ugc") ## End(Not run) # Usage library(rebus.base) x <- "I exchanged $1000 for \u20ac665.41 and \u00a3243.13." (rx <- capture(ugc_currency_symbol()) %R% capture( ugc_decimal_number(1, Inf) %R% optional(group("." %R% ugc_decimal_number(2))) ) ) stringi::stri_match_all_regex(x, rx)
Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.