Vocabulary and hash vectorizers
This function creates an object (closure) which defines on how to transform list of tokens into vector space - i.e. how to map words to indices. It supposed to be used only as argument to create_dtm, create_tcm, create_vocabulary.
vocab_vectorizer(vocabulary) hash_vectorizer(hash_size = 2^18, ngram = c(1L, 1L), signed_hash = FALSE)
vocabulary |
|
hash_size |
|
ngram |
|
signed_hash |
|
A vectorizer object
(closure).
data("movie_review") N = 100 vectorizer = hash_vectorizer(2 ^ 18, c(1L, 2L)) it = itoken(movie_review$review[1:N], preprocess_function = tolower, tokenizer = word_tokenizer, n_chunks = 10) hash_dtm = create_dtm(it, vectorizer) it = itoken(movie_review$review[1:N], preprocess_function = tolower, tokenizer = word_tokenizer, n_chunks = 10) v = create_vocabulary(it, c(1L, 1L) ) vectorizer = vocab_vectorizer(v) it = itoken(movie_review$review[1:N], preprocess_function = tolower, tokenizer = word_tokenizer, n_chunks = 10) dtm = create_dtm(it, vectorizer)
Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.