kerasR: Tokenizer – R documentation

Become an expert in R — Interactive courses, Cheat Sheets, certificates and more!

Tokenizer

Description

Returns an object for vectorizing texts, or/and turning texts into sequences (=list of word indexes, where the word of rank i in the dataset (starting at 1) has index i).

Usage

Tokenizer(num_words = NULL,
  filters = "!\"#$%&()*+,-./:;<=>?@[\\]^_`{|}~\t\n", lower = TRUE,
  split = " ")

Arguments

`num_words`	integer. None or int. Maximum number of words to work with.
`filters`	vector (or concatenation) of characters to filter out, such as punctuation.
`lower`	boolean. Whether to set the text to lowercase.
`split`	string. Separator for word splitting.

Author(s)

Taylor B. Arnold, taylor.arnold@acm.org

References

Chollet, Francois. 2015. Keras: Deep Learning library for Theano and TensorFlow.

Tokenizer

Description

Usage

Arguments

Author(s)

References

See Also

kerasR

We don't support your browser anymore