Gives the window in which a term occured in a matrix.
This function returns the occurence of tokens (position.matrix) and the window of occurence (window.matrix). This format enables the co-occurence of tokens within sliding windows (i.e. token distance) to be calculated by multiplying position.matrix with window.matrix.
tokenWindowOccurence( tc, feature, context_level = c("document", "sentence"), window.size = 10, direction = "<>", distance_as_value = F, batch_rows = NULL, drop_empty_terms = T )
tc |
a tCorpus object |
feature |
The name of the feature column |
context_level |
Select whether to use "document" or "sentence" as context boundaries |
window.size |
The distance within which tokens should occur from each other to be counted as a co-occurence. |
direction |
a string indicating whether only the left ('<') or right ('>') side of the window, or both ('<>'), should be used. |
distance_as_value |
If True, the values of the matrix will represent the shorts distance to the occurence of a feature |
batch_rows |
Used in functions that call this function in batches |
drop_empty_terms |
If TRUE, emtpy terms (with zero occurence) will be dropped |
A list with two matrices. position.mat gives the specific position of a term, and window.mat gives the window in which each token occured. The rows represent the position of a term, and matches the input of this function (position, term and context). The columns represents terms.
Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.