corpustools: udpipe_simplify – R documentation

Pricing

Become an expert in R — Interactive courses, Cheat Sheets, certificates and more!

Get Started for Free

Documentation

corpustools

udpipe_simplify

Simplify tokenIndex created with the udpipe parser

Description

This is an off-the-shelf implementation of several rsyntax transformation for simplifying text.

Usage

udpipe_simplify(
  tokens,
  split_conj = T,
  rm_punct = F,
  new_sentences = F,
  rm_mark = F
)

Arguments

`tokens`	A tokenIndex, based on output from the ud parser.
`split_conj`	If TRUE, split conjunctions into separate sentences
`rm_punct`	If TRUE, remove punctuation afterwards
`new_sentences`	If TRUE, assign new sentence and token_id after splitting
`rm_mark`	If TRUE, remove children with a mark relation if this is used in the simplification.

Value

a tokenIndex

Examples

if (interactive()) {
tc = tc_sotu_udpipe$copy()
tc2 = transform_rsyntax(tc, udpipe_simplify)

browse_texts(tc2)
   rsyntax::plot_tree(tc_sotu_udpipe$tokens, token, lemma, POS, sentence_i=20)
   rsyntax::plot_tree(tc2$tokens, token, lemma, POS, sentence_i=20)
}

corpustools

Managing, Querying and Analyzing Tokenized Text

v0.4.10

GPL-3

Authors

Kasper Welbers and Wouter van Atteveldt

Initial release

2022-05-03

udpipe_simplify

Description

Usage

Arguments

Value

Examples

corpustools

We don't support your browser anymore