Become an expert in R — Interactive courses, Cheat Sheets, certificates and more!
Get Started for Free

Tokenizer

Tokenizer


Description

Returns an object for vectorizing texts, or/and turning texts into sequences (=list of word indexes, where the word of rank i in the dataset (starting at 1) has index i).

Usage

Tokenizer(num_words = NULL,
  filters = "!\"#$%&()*+,-./:;<=>?@[\\]^_`{|}~\t\n", lower = TRUE,
  split = " ")

Arguments

num_words

integer. None or int. Maximum number of words to work with.

filters

vector (or concatenation) of characters to filter out, such as punctuation.

lower

boolean. Whether to set the text to lowercase.

split

string. Separator for word splitting.

Author(s)

Taylor B. Arnold, taylor.arnold@acm.org

References

See Also


kerasR

R Interface to the Keras Deep Learning Library

v0.6.1
LGPL-2
Authors
Taylor Arnold [aut, cre]
Initial release

We don't support your browser anymore

Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.