Become an expert in R — Interactive courses, Cheat Sheets, certificates and more!
Get Started for Free

encoding

detect the encoding of texts


Description

Detect the encoding of texts in a character readtext object and report on the most likely encoding for each document. Useful in detecting the encoding of input texts, so that a source encoding can be (re)specified when inputting a set of texts using readtext, prior to constructing a corpus.

Usage

encoding(x, verbose = TRUE, ...)

Arguments

x

character vector, corpus, or readtext object whose texts' encodings will be detected.

verbose

if FALSE, do not print diagnostic report

...

additional arguments passed to stri_enc_detect

Details

Based on stri_enc_detect, which is in turn based on the ICU libraries. See the ICU User Guide, http://userguide.icu-project.org/conversion/detection.

Examples

## Not run: encoding(data_char_encodedtexts)
# show detected value for each text, versus known encoding
data.frame(labelled = names(data_char_encodedtexts), 
           detected = encoding(data_char_encodedtexts)$all)

# Russian text, Windows-1251
myreadtext <- readtext("https://kenbenoit.net/files/01_er_5.txt")
encoding(myreadtext)

## End(Not run)

readtext

Import and Handling for Plain and Formatted Text Files

v0.80
GPL-3
Authors
Kenneth Benoit [aut, cre, cph], Adam Obeng [aut], Kohei Watanabe [ctb], Akitaka Matsuo [ctb], Paul Nulty [ctb], Stefan Müller [ctb]
Initial release

We don't support your browser anymore

Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.