cleanNLP: cleanNLP-package – R documentation

Become an expert in R — Interactive courses, Cheat Sheets, certificates and more!

cleanNLP-package

cleanNLP: A Tidy Data Model for Natural Language Processing

Description

Provides a set of fast tools for converting a textual corpus into a set of normalized tables. Multiple NLP backends can be used, with the output standardized into a normalized format. Options include stringi (very fast, but only provides tokenization), udpipe (fast, many languages, includes part of speech tags and dependencies), coreNLP (using its Python backend), and spacy (python backend; includes named entity recognition).

Details

Once the package is set up, run one of cnlp_init_stringi, cnlp_init_spacy, cnlp_init_corenlp, or cnlp_init_udpipe to load the desired NLP backend. After this function is done running, use cnlp_annotate to run the annotation engine over a corpus of text. The package vignettes provide more detailed set-up information.

Author(s)

Maintainer: Taylor B. Arnold taylor.arnold@acm.org

Examples

## Not run: 
library(cleanNLP)

# load the annotation engine
cnlp_init_stringi()

# annotate your text
input <- data.frame(
 text=c(
   "This is a sentence.",
   "Here is something else to parse!"
 ),
 stringsAsFactors=FALSE
)

## End(Not run)

cleanNLP

A Tidy Data Model for Natural Language Processing

v3.0.3

LGPL-2

Authors

Taylor B. Arnold [aut, cre]

Initial release