Become an expert in R — Interactive courses, Cheat Sheets, certificates and more!
Get Started for Free

readDOC

Read In a MS Word Document


Description

Return a function which reads in a Microsoft Word document extracting its text.

Usage

readDOC(engine = c("antiword", "executable"), AntiwordOptions = "")

Arguments

engine

a character string for the preferred DOC extraction engine (see Details).

AntiwordOptions

Options passed over to antiword executable.

Details

Formally this function is a function generator, i.e., it returns a function (which reads in a text document) with a well-defined signature, but can access passed over arguments (e.g., options to antiword) via lexical scoping.

Available DOC extraction engines are as follows.

"antiword"

(default) Antiword utility as provided by the function antiword in package antiword.

"executable"

command line antiword executable which must be installed and accessible on your system. This can convert documents from Microsoft Word version 2, 6, 7, 97, 2000, 2002 and 2003 to plain text, and is available from http://www.winfield.demon.nl/. The character vector AntiwordOptions is passed over to the executable.

Value

A function with the following formals:

elem

a list with the named component uri which must hold a valid file name.

language

a string giving the language.

id

Not used.

The function returns a PlainTextDocument representing the text and metadata extracted from elem$uri.

See Also

Reader for basic information on the reader infrastructure employed by package tm.


tm

Text Mining Package

v0.7-8
GPL-3
Authors
Ingo Feinerer [aut, cre] (<https://orcid.org/0000-0001-7656-8338>), Kurt Hornik [aut] (<https://orcid.org/0000-0003-4198-9911>), Artifex Software, Inc. [ctb, cph] (pdf_info.ps taken from GPL Ghostscript)
Initial release
2020-11-17

We don't support your browser anymore

Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.