Extract or consolidate noun phrases from parsed documents
From an object parsed by spacy_parse
, extract the multi-word
noun phrases as a separate object, or convert the multi-word noun phrases
into single "token" consisting of the concatenated elements of the multi-word
noun phrases.
nounphrase_extract(x, concatenator = "_") nounphrase_consolidate(x, concatenator = "_")
x |
output from |
concatenator |
the character(s) used to join elements of multi-word noun phrases |
noun
returns a data.frame
of all named
entities, containing the following fields:
doc_id
name of the document containing the noun phrase
sentence_id
the sentence ID containing the noun phrase,
within the document
nounphrase
the noun phrase
root
the root token of the noun phrase
nounphrase_consolidate
returns a modified data.frame
of
parsed results, where the noun phrases have been combined into a single
"token". Currently, dependency parsing is removed when this consolidation
occurs.
spacy_initialize() # entity extraction txt <- "Mr. Smith of moved to San Francisco in December." parsed <- spacy_parse(txt, nounphrase = TRUE) entity_extract(parsed) # consolidating multi-word noun phrases txt <- "The House of Representatives voted to suspend aid to South Dakota." parsed <- spacy_parse(txt, nounphrase = TRUE) nounphrase_consolidate(parsed)
Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.