Become an expert in R — Interactive courses, Cheat Sheets, certificates and more!
Get Started for Free

spacy_extract_entity

Extract named entities from texts using spaCy


Description

This function extracts named entities from texts, based on the entity tag ent attributes of documents objects parsed by spaCy (see https://spacy.io/usage/linguistic-features#section-named-entities).

Usage

spacy_extract_entity(
  x,
  output = c("data.frame", "list"),
  type = c("all", "named", "extended"),
  multithread = TRUE,
  ...
)

Arguments

x

a character object or a TIF-compliant corpus data.frame (see https://github.com/ropensci/tif)

output

type of returned object, either "list" or "data.frame".

type

type of named entities, either named, extended, or all. See https://spacy.io/docs/usage/entity-recognition#entity-types for details.

multithread

logical; If TRUE, the processing is parallelized using spaCy's architecture (https://spacy.io/api)

...

unused

Details

When the option output = "data.frame" is selected, the function returns a data.frame with the following fields.

text

contents of entity

entity_type

type of entity (e.g. ORG for organizations)

start_id

serial number ID of starting token. This number corresponds with the number of data.frame returned from spacy_tokenize(x) with default options.

length

number of words (tokens) included in a named entity (e.g. for an entity, "New York Stock Exchange"", length = 4)

Value

either a list or data.frame of tokens

Examples

spacy_initialize()

txt <- c(doc1 = "The Supreme Court is located in Washington D.C.",
         doc2 = "Paul earned a postgraduate degree from MIT.")
spacy_extract_entity(txt)
spacy_extract_entity(txt, output = "list")

spacyr

Wrapper to the 'spaCy' 'NLP' Library

v1.2.1
GPL-3
Authors
Kenneth Benoit [cre, aut, cph] (<https://orcid.org/0000-0002-0797-564X>), Akitaka Matsuo [aut] (<https://orcid.org/0000-0002-3323-6330>), European Research Council [fnd] (ERC-2011-StG 283794-QUANTESS)
Initial release

We don't support your browser anymore

Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.