corpustools: tokenWindowOccurence – R documentation

Become an expert in R — Interactive courses, Cheat Sheets, certificates and more!

tokenWindowOccurence

Gives the window in which a term occured in a matrix.

Description

This function returns the occurence of tokens (position.matrix) and the window of occurence (window.matrix). This format enables the co-occurence of tokens within sliding windows (i.e. token distance) to be calculated by multiplying position.matrix with window.matrix.

Usage

tokenWindowOccurence(
  tc,
  feature,
  context_level = c("document", "sentence"),
  window.size = 10,
  direction = "<>",
  distance_as_value = F,
  batch_rows = NULL,
  drop_empty_terms = T
)

Arguments

`tc`	a tCorpus object
`feature`	The name of the feature column
`context_level`	Select whether to use "document" or "sentence" as context boundaries
`window.size`	The distance within which tokens should occur from each other to be counted as a co-occurence.
`direction`	a string indicating whether only the left ('<') or right ('>') side of the window, or both ('<>'), should be used.
`distance_as_value`	If True, the values of the matrix will represent the shorts distance to the occurence of a feature
`batch_rows`	Used in functions that call this function in batches
`drop_empty_terms`	If TRUE, emtpy terms (with zero occurence) will be dropped

Value

A list with two matrices. position.mat gives the specific position of a term, and window.mat gives the window in which each token occured. The rows represent the position of a term, and matches the input of this function (position, term and context). The columns represents terms.

corpustools

Managing, Querying and Analyzing Tokenized Text

v0.4.10

GPL-3

Authors

Kasper Welbers and Wouter van Atteveldt

Initial release

2022-05-03