Term-co-occurence matrix construction
This is a function for constructing a term-co-occurrence matrix(TCM). TCM matrix usually used with GloVe word embedding model.
create_tcm(it, vectorizer, skip_grams_window = 5L, skip_grams_window_context = c("symmetric", "right", "left"), weights = 1/seq_len(skip_grams_window), binary_cooccurence = FALSE, ...) ## S3 method for class 'itoken' create_tcm(it, vectorizer, skip_grams_window = 5L, skip_grams_window_context = c("symmetric", "right", "left"), weights = 1/seq_len(skip_grams_window), binary_cooccurence = FALSE, ...) ## S3 method for class 'itoken_parallel' create_tcm(it, vectorizer, skip_grams_window = 5L, skip_grams_window_context = c("symmetric", "right", "left"), weights = 1/seq_len(skip_grams_window), binary_cooccurence = FALSE, ...)
it |
|
vectorizer |
|
skip_grams_window |
|
skip_grams_window_context |
one of |
weights |
weights for context/distant words during co-occurence statistics calculation.
By default we are setting |
binary_cooccurence |
|
... |
placeholder for additional arguments (not used at the moment).
|
If a parallel backend is registered, it will construct the TCM in multiple threads.
The user should keep in mind that he/she should split data and provide a list
of itoken iterators. Each element of it
will be handled
in a separate thread combined at the end of processing.
dgTMatrix
TCM matrix
## Not run: data("movie_review") # single thread tokens = word_tokenizer(tolower(movie_review$review)) it = itoken(tokens) v = create_vocabulary(jobs) vectorizer = vocab_vectorizer(v) tcm = create_tcm(itoken(tokens), vectorizer, skip_grams_window = 3L) # parallel version # set to number of cores on your machine it = token_parallel(movie_review$review[1:N], tolower, word_tokenizer, movie_review$id[1:N]) v = create_vocabulary(jobs) vectorizer = vocab_vectorizer(v) dtm = create_dtm(it, vectorizer, type = 'dgTMatrix') tcm = create_tcm(jobs, vectorizer, skip_grams_window = 3L, skip_grams_window_context = "symmetric") ## End(Not run)
Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.