Become an expert in R — Interactive courses, Cheat Sheets, certificates and more!
Get Started for Free

readRCV1

Read In a Reuters Corpus Volume 1 Document


Description

Read in a Reuters Corpus Volume 1 XML document.

Usage

readRCV1(elem, language, id)
readRCV1asPlain(elem, language, id)

Arguments

elem

a named list with the component content which must hold the document to be read in.

language

a string giving the language.

id

Not used.

Value

An XMLTextDocument for readRCV1, or a PlainTextDocument for readRCV1asPlain, representing the text and metadata extracted from elem$content.

References

Lewis, D. D.; Yang, Y.; Rose, T.; and Li, F (2004). RCV1: A New Benchmark Collection for Text Categorization Research. Journal of Machine Learning Research, 5, 361–397. https://www.jmlr.org/papers/volume5/lewis04a/lewis04a.pdf

See Also

Reader for basic information on the reader infrastructure employed by package tm.

Examples

f <- system.file("texts", "rcv1_2330.xml", package = "tm")
f_bin <- readBin(f, raw(), file.size(f))
rcv1 <- readRCV1(elem = list(content = f_bin), language = "en", id = "id1")
content(rcv1)
meta(rcv1)

tm

Text Mining Package

v0.7-8
GPL-3
Authors
Ingo Feinerer [aut, cre] (<https://orcid.org/0000-0001-7656-8338>), Kurt Hornik [aut] (<https://orcid.org/0000-0003-4198-9911>), Artifex Software, Inc. [ctb, cph] (pdf_info.ps taken from GPL Ghostscript)
Initial release
2020-11-17

We don't support your browser anymore

Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.