Become an expert in R — Interactive courses, Cheat Sheets, certificates and more!
Get Started for Free

cleanNLP-package

cleanNLP: A Tidy Data Model for Natural Language Processing


Description

Provides a set of fast tools for converting a textual corpus into a set of normalized tables. Multiple NLP backends can be used, with the output standardized into a normalized format. Options include stringi (very fast, but only provides tokenization), udpipe (fast, many languages, includes part of speech tags and dependencies), coreNLP (using its Python backend), and spacy (python backend; includes named entity recognition).

Details

Once the package is set up, run one of cnlp_init_stringi, cnlp_init_spacy, cnlp_init_corenlp, or cnlp_init_udpipe to load the desired NLP backend. After this function is done running, use cnlp_annotate to run the annotation engine over a corpus of text. The package vignettes provide more detailed set-up information.

Author(s)

Maintainer: Taylor B. Arnold taylor.arnold@acm.org

See Also

Useful links:

Examples

## Not run: 
library(cleanNLP)

# load the annotation engine
cnlp_init_stringi()

# annotate your text
input <- data.frame(
 text=c(
   "This is a sentence.",
   "Here is something else to parse!"
 ),
 stringsAsFactors=FALSE
)

## End(Not run)

cleanNLP

A Tidy Data Model for Natural Language Processing

v3.0.3
LGPL-2
Authors
Taylor B. Arnold [aut, cre]
Initial release

We don't support your browser anymore

Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.