Become an expert in R — Interactive courses, Cheat Sheets, certificates and more!
Get Started for Free

tokenizers

Tokenizers


Description

A collection of functions with a consistent interface to convert natural language text into tokens.

Details

The tokenizers in this package have a consistent interface. They all take either a character vector of any length, or a list where each element is a character vector of length one. The idea is that each element comprises a text. Then each function returns a list with the same length as the input vector, where each element in the list are the tokens generated by the function. If the input character vector or list is named, then the names are preserved.


tokenizers

Fast, Consistent Tokenization of Natural Language Text

v0.2.1
MIT + file LICENSE
Authors
Lincoln Mullen [aut, cre] (<https://orcid.org/0000-0001-5103-6917>), Os Keyes [ctb] (<https://orcid.org/0000-0001-5196-609X>), Dmitriy Selivanov [ctb], Jeffrey Arnold [ctb] (<https://orcid.org/0000-0001-9953-3904>), Kenneth Benoit [ctb] (<https://orcid.org/0000-0002-0797-564X>)
Initial release
2018-03-29

We don't support your browser anymore

Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.