utf8: utf8_normalize – R documentation

Pricing

Become an expert in R — Interactive courses, Cheat Sheets, certificates and more!

Get Started for Free

Documentation

utf8

utf8_normalize

Text Normalization

Description

Transform text to normalized form, optionally mapping to lowercase and applying compatibility maps.

Usage

utf8_normalize(x, map_case = FALSE, map_compat = FALSE,
               map_quote = FALSE, remove_ignorable = FALSE)

Arguments

`x`	character object.
`map_case`	a logical value indicating whether to apply Unicode case mapping to the text. For most languages, this transformation changes uppercase characters to their lowercase equivalents.
`map_compat`	a logical value indicating whether to apply Unicode compatibility mappings to the characters, those required for NFKC and NFKD normal forms.
`map_quote`	a logical value indicating whether to replace curly single quotes and Unicode apostrophe characters with ASCII apostrophe (U+0027).
`remove_ignorable`	a logical value indicating whether to remove Unicode "default ignorable" characters like zero-width spaces and soft hyphens.

Details

utf8_normalize converts the elements of a character object to Unicode normalized composed form (NFC) while applying the character maps specified by the map_case, map_compat, map_quote, and remove_ignorable arguments.

Value

The result is a character object with the same attributes as x but with Encoding set to "UTF-8".

Examples

angstrom <- c("\u00c5", "\u0041\u030a", "\u212b")
utf8_normalize(angstrom) == "\u00c5"

utf8

Unicode Text Processing

v1.2.1

Apache License (== 2.0) | file LICENSE

Authors

Patrick O. Perry [aut, cph], Kirill Müller [cre], Unicode, Inc. [cph, dtc] (Unicode Character Database)

Initial release

utf8_normalize

Description

Usage

Arguments

Details

Value

See Also

Examples

utf8

We don't support your browser anymore