Become an expert in R — Interactive courses, Cheat Sheets, certificates and more!
Get Started for Free

txt_previousgram

Based on a vector with a word sequence, get n-grams (looking backward)


Description

If you have annotated your text using udpipe_annotate, your text is tokenised in a sequence of words. Based on this vector of words in sequence getting n-grams comes down to looking at the previous word and the subsequent previous word andsoforth. These words can be pasted together to form an n-gram containing the second previous word, the previous word, the current word ...

Usage

txt_previousgram(x, n = 2, sep = " ")

Arguments

x

a character vector where each element is just 1 term or word

n

an integer indicating the ngram. Values of 1 will keep the x, a value of 2 will append the previous term to the current term, a value of 3 will append the second previous term term and the previous term preceding the current term to the current term

sep

a character element indicating how to paste the subsequent words together

Value

a character vector of the same length of x with the n-grams

See Also

Examples

x <- sprintf("%s%s", LETTERS, 1:26)
txt_previousgram(x, n = 2)

data.frame(words = x,
           bigram = txt_previousgram(x, n = 2),
           trigram = txt_previousgram(x, n = 3, sep = "-"),
           quatrogram = txt_previousgram(x, n = 4, sep = ""),
           stringsAsFactors = FALSE)

x <- c("A1", "A2", "A3", NA, "A4", "A5")
data.frame(x, 
           bigram = txt_previousgram(x, n = 2, sep = "_"),
           stringsAsFactors = FALSE)

udpipe

Tokenization, Parts of Speech Tagging, Lemmatization and Dependency Parsing with the 'UDPipe' 'NLP' Toolkit

v0.8.5
MPL-2.0
Authors
Jan Wijffels [aut, cre, cph], BNOSAC [cph], Institute of Formal and Applied Linguistics, Faculty of Mathematics and Physics, Charles University in Prague, Czech Republic [cph], Milan Straka [ctb, cph], Jana Straková [ctb, cph]
Initial release

We don't support your browser anymore

Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.