Become an expert in R — Interactive courses, Cheat Sheets, certificates and more!
Get Started for Free

as_tokenindex

Prepare a tokenIndex


Description

Creates a tokenIndex data.table. Accepts any data.frame given that the required columns (doc_id, sentence, token_id, parent, relation) are present. The names of these columns must be one of the values specified in the respective arguments.

The data in the data.frame will not be changed, with three exceptions. First, the columnnames will be changed if the default values are not used. Second, if a token has itself as its parent (which in some parsers is used to indicate the root), the parent is set to NA (as used in other parsers) to prevent infinite cycles. Third, the data will be sorted by doc_id, sentence, token_id.

Usage

as_tokenindex(
  tokens,
  doc_id = c("doc_id", "document_id"),
  sentence = c("sentence", "sentence_id"),
  token_id = c("token_id"),
  parent = c("parent", "head_token_id"),
  relation = c("relation", "dep_rel")
)

Arguments

tokens

A data.frame, data.table, or tokenindex.

doc_id

candidate names for the document id columns

sentence

candidate names for sentence (id/index) column

token_id

candidate names for the token id column. Has to be numeric (Some parsers return token_id's as numbers with a prefix (t_1, w_1))

parent

candidate names for the parent id column. Has to be numeric

relation

candidate names for the relation column

Value

a tokenIndex

Examples

as_tokenindex(tokens_corenlp)

rsyntax

Extract Semantic Relations from Text by Querying and Reshaping Syntax

v0.1.1
GPL-3
Authors
Kasper Welbers and Wouter van Atteveldt
Initial release
2020-10-22

We don't support your browser anymore

Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.