Extract or consolidate entities from parsed documents
From an object parsed by spacy_parse
, extract the entities as a
separate object, or convert the multi-word entities into single "token"
consisting of the concatenated elements of the multi-word entities.
entity_extract(x, type = c("named", "extended", "all"), concatenator = "_") entity_consolidate(x, concatenator = "_")
x |
output from |
type |
type of named entities, either |
concatenator |
the character(s) used to join the elements of multi-word named entities |
entity_extract
returns a data.frame
of all named
entities, containing the following fields:
doc_id
name of the document containing the entity
sentence_id
the sentence ID containing the entity, within the document
entity
the named entity
entity_type
type of named entities (e.g. PERSON, ORG, PERCENT,
etc.)
entity_consolidate
returns a modified data.frame
of
parsed results, where the named entities have been combined into a single
"token". Currently, dependency parsing is removed when this consolidation
occurs.
spacy_initialize() # entity extraction txt <- "Mr. Smith of moved to San Francisco in December." parsed <- spacy_parse(txt, entity = TRUE) entity_extract(parsed) entity_extract(parsed, type = "all") # consolidating multi-word entities txt <- "The House of Representatives voted to suspend aid to South Dakota." parsed <- spacy_parse(txt, entity = TRUE) entity_consolidate(parsed)
Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.