Become an expert in R — Interactive courses, Cheat Sheets, certificates and more!
Get Started for Free

export_span_annotations

Export span annotations


Description

Export columns from a tCorpus as span annotations (annotations over a span of text). The annotations are returned as a data.table where each row is an annotation, with columns: doc_id, variable, value, field, offset, length and text. The key purpose is that these span annotations are linked to exact character positions in the text. This also means that this function can only be used if position information is available (i.e. if remember_spaces=T was used when creating the tCorpus)

Usage

export_span_annotations(tc, variables)

Arguments

tc

A tCorpus, created with create_tcorpus, where remember_spaces must have been set to TRUE

variables

A character vector with variables (columns in tc$tokens) to export

Details

Note that if there are spans with gaps in them (e.g. based on proximity queries), they are split into different annotations. Thus some information can be lost.

Value

A data.table where each row is a span annotation, with columns: doc_id, variable, value, field, offset, length, text

Examples

tc = create_tcorpus(sotu_texts, c('president','text'), doc_column='id', remember_spaces=TRUE)
tc$code_features(c('war# war peace', 'us being# <(i we) (am are)>'))
export_span_annotations(tc, 'code')

corpustools

Managing, Querying and Analyzing Tokenized Text

v0.4.10
GPL-3
Authors
Kasper Welbers and Wouter van Atteveldt
Initial release
2022-05-03

We don't support your browser anymore

Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.