Become an expert in R — Interactive courses, Cheat Sheets, certificates and more!
Get Started for Free

entity_extract

Extract or consolidate entities from parsed documents


Description

From an object parsed by spacy_parse, extract the entities as a separate object, or convert the multi-word entities into single "token" consisting of the concatenated elements of the multi-word entities.

Usage

entity_extract(x, type = c("named", "extended", "all"), concatenator = "_")

entity_consolidate(x, concatenator = "_")

Arguments

x

output from spacy_parse.

type

type of named entities, either named, extended, or all. See https://spacy.io/docs/usage/entity-recognition#entity-types for details.

concatenator

the character(s) used to join the elements of multi-word named entities

Value

entity_extract returns a data.frame of all named entities, containing the following fields:

  • doc_id name of the document containing the entity

  • sentence_id the sentence ID containing the entity, within the document

  • entity the named entity

  • entity_type type of named entities (e.g. PERSON, ORG, PERCENT, etc.)

entity_consolidate returns a modified data.frame of parsed results, where the named entities have been combined into a single "token". Currently, dependency parsing is removed when this consolidation occurs.

Examples

spacy_initialize()

# entity extraction
txt <- "Mr. Smith of moved to San Francisco in December."
parsed <- spacy_parse(txt, entity = TRUE)
entity_extract(parsed)
entity_extract(parsed, type = "all")


# consolidating multi-word entities 
txt <- "The House of Representatives voted to suspend aid to South Dakota."
parsed <- spacy_parse(txt, entity = TRUE)
entity_consolidate(parsed)

spacyr

Wrapper to the 'spaCy' 'NLP' Library

v1.2.1
GPL-3
Authors
Kenneth Benoit [cre, aut, cph] (<https://orcid.org/0000-0002-0797-564X>), Akitaka Matsuo [aut] (<https://orcid.org/0000-0002-3323-6330>), European Research Council [fnd] (ERC-2011-StG 283794-QUANTESS)
Initial release

We don't support your browser anymore

Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.