spacyr: spacy_extract_nounphrases – R documentation

Become an expert in R — Interactive courses, Cheat Sheets, certificates and more!

spacy_extract_nounphrases

Extract noun phrases from texts using spaCy

Description

This function extracts noun phrases from documents, based on the noun_chunks attributes of documents objects parsed by spaCy (see https://spacy.io/usage/linguistic-features#noun-chunks).

Usage

spacy_extract_nounphrases(
  x,
  output = c("data.frame", "list"),
  multithread = TRUE,
  ...
)

Arguments

`x`	a character object or a TIF-compliant corpus data.frame (see https://github.com/ropensci/tif)
`output`	type of returned object, either `"data.frame"` or `"list"`
`multithread`	logical; If `TRUE`, the processing is parallelized using spaCy's architecture (https://spacy.io/api)
`...`	unused

Details

When the option output = "data.frame" is selected, the function returns a data.frame with the following fields.

text: contents of noun-phrase
root_text: contents of root token
start_id: serial number ID of starting token. This number corresponds with the number of data.frame returned from spacy_tokenize(x) with default options.
root_id: serial number ID of root token
length: number of words (tokens) included in a noun-phrase (e.g. for a noun-phrase, "individual car owners", length = 3)

Value

either a list or data.frame of tokens

Examples

spacy_initialize()

txt <- c(doc1 = "Natural language processing is a branch of computer science.",
         doc2 = "Paul earned a postgraduate degree from MIT.")
spacy_extract_nounphrases(txt)
spacy_extract_nounphrases(txt, output = "list")

spacyr

Wrapper to the 'spaCy' 'NLP' Library

v1.2.1

GPL-3

Authors

Kenneth Benoit [cre, aut, cph] (<https://orcid.org/0000-0002-0797-564X>), Akitaka Matsuo [aut] (<https://orcid.org/0000-0002-3323-6330>), European Research Council [fnd] (ERC-2011-StG 283794-QUANTESS)

Initial release