spacyr: nounphrase_extract – R documentation

Become an expert in R — Interactive courses, Cheat Sheets, certificates and more!

nounphrase_extract

Extract or consolidate noun phrases from parsed documents

Description

From an object parsed by spacy_parse, extract the multi-word noun phrases as a separate object, or convert the multi-word noun phrases into single "token" consisting of the concatenated elements of the multi-word noun phrases.

Usage

nounphrase_extract(x, concatenator = "_")

nounphrase_consolidate(x, concatenator = "_")

Arguments

`x`	output from `spacy_parse`
`concatenator`	the character(s) used to join elements of multi-word noun phrases

Value

noun returns a data.frame of all named entities, containing the following fields:

doc_id name of the document containing the noun phrase
sentence_id the sentence ID containing the noun phrase, within the document
nounphrasethe noun phrase
root the root token of the noun phrase

nounphrase_consolidate returns a modified data.frame of parsed results, where the noun phrases have been combined into a single "token". Currently, dependency parsing is removed when this consolidation occurs.

Examples

spacy_initialize()

# entity extraction
txt <- "Mr. Smith of moved to San Francisco in December."
parsed <- spacy_parse(txt, nounphrase = TRUE)
entity_extract(parsed)


# consolidating multi-word noun phrases
txt <- "The House of Representatives voted to suspend aid to South Dakota."
parsed <- spacy_parse(txt, nounphrase = TRUE)
nounphrase_consolidate(parsed)

spacyr

Wrapper to the 'spaCy' 'NLP' Library

v1.2.1

GPL-3

Authors

Kenneth Benoit [cre, aut, cph] (<https://orcid.org/0000-0002-0797-564X>), Akitaka Matsuo [aut] (<https://orcid.org/0000-0002-3323-6330>), European Research Council [fnd] (ERC-2011-StG 283794-QUANTESS)

Initial release