tokenizers: tokenizers – R documentation

Pricing

Become an expert in R — Interactive courses, Cheat Sheets, certificates and more!

Get Started for Free

Documentation

tokenizers

Tokenizers

Description

A collection of functions with a consistent interface to convert natural language text into tokens.

Details

The tokenizers in this package have a consistent interface. They all take either a character vector of any length, or a list where each element is a character vector of length one. The idea is that each element comprises a text. Then each function returns a list with the same length as the input vector, where each element in the list are the tokens generated by the function. If the input character vector or list is named, then the names are preserved.

tokenizers

Fast, Consistent Tokenization of Natural Language Text

v0.2.1

MIT + file LICENSE

Authors

Lincoln Mullen [aut, cre] (<https://orcid.org/0000-0001-5103-6917>), Os Keyes [ctb] (<https://orcid.org/0000-0001-5196-609X>), Dmitriy Selivanov [ctb], Jeffrey Arnold [ctb] (<https://orcid.org/0000-0001-9953-3904>), Kenneth Benoit [ctb] (<https://orcid.org/0000-0002-0797-564X>)

Initial release

2018-03-29

tokenizers

Description

Details

tokenizers

We don't support your browser anymore