Become an expert in R — Interactive courses, Cheat Sheets, certificates and more!
Get Started for Free

cl_charset_name

Get charset of a corpus.


Description

The encoding of a corpus is declared in the registry file (corpus property "charset"). Once a corpus is loaded, this information is available without parsing the registry file again and again. The cl_charset_name offers a quick access to this information.

Usage

cl_charset_name(corpus, registry = Sys.getenv("CORPUS_REGISTRY"))

Arguments

corpus

Name of a CWB corpus (upper case).

registry

Path to the registry directory, defaults to the value of the environment variable CORPUS_REGISTRY

Examples

cl_charset_name(
  corpus = "REUTERS",
  registry = system.file(package = "RcppCWB", "extdata", "cwb", "registry")
)

RcppCWB

'Rcpp' Bindings for the 'Corpus Workbench' ('CWB')

v0.3.2
GPL-3
Authors
Andreas Blaette [aut, cre], Bernard Desgraupes [aut], Sylvain Loiseau [aut], Oliver Christ [ctb], Bruno Maximilian Schulze [ctb], Stefan Evert [ctb], Arne Fitschen [ctb], Jeroen Ooms [ctb], Marius Bertram [ctb]
Initial release
2021-02-03

We don't support your browser anymore

Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.