Become an expert in R — Interactive courses, Cheat Sheets, certificates and more!
Get Started for Free

as_utf8

UTF-8 Character Encoding


Description

UTF-8 text encoding and validation.

Usage

as_utf8(x, normalize = FALSE)

utf8_valid(x)

Arguments

x

character object.

normalize

a logical value indicating whether to convert to Unicode composed normal form (NFC).

Details

as_utf8 converts a character object from its declared encoding to a valid UTF-8 character object, or throws an error if no conversion is possible. If normalize = TRUE, then the text gets transformed to Unicode composed normal form (NFC) after conversion to UTF-8.

utf8_valid tests whether the elements of a character object can be translated to valid UTF-8 strings.

Value

For as_utf8, the result is a character object with the same attributes as x but with Encoding set to "UTF-8".

For utf8_valid a logical object with the same names, dim, and dimnames as x.

See Also

Examples

# the second element is encoded in latin-1, but declared as UTF-8
x <- c("fa\u00E7ile", "fa\xE7ile", "fa\xC3\xA7ile")
Encoding(x) <- c("UTF-8", "UTF-8", "bytes")

# attempt to convert to UTF-8 (fails)
## Not run: as_utf8(x)

y <- x
Encoding(y[2]) <- "latin1" # mark the correct encoding
as_utf8(y) # succeeds

# test for valid UTF-8
utf8_valid(x)

utf8

Unicode Text Processing

v1.2.1
Apache License (== 2.0) | file LICENSE
Authors
Patrick O. Perry [aut, cph], Kirill Müller [cre], Unicode, Inc. [cph, dtc] (Unicode Character Database)
Initial release

We don't support your browser anymore

Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.