Become an expert in R — Interactive courses, Cheat Sheets, certificates and more!
Get Started for Free

ifiles

Creates iterator over text files from the disk


Description

The result of this function usually used in an itoken function.

Usage

ifiles(file_paths, reader = readLines)

idir(path, reader = readLines)

ifiles_parallel(file_paths, reader = readLines, ...)

Arguments

file_paths

character paths of input files

reader

function which will perform reading of text files from disk, which should take a path as its first argument. reader() function should return named character vector: elements of vector = documents, names of the elements = document ids which will be used in DTM construction. If user doesn't provide named character vector, document ids will be generated as file_name + line_number (assuming that each line is a document).

path

character path of directory. All files in the directory will be read.

...

other arguments (not used at the moment)

See Also

Examples

## Not run: 
current_dir_files = list.files(path = ".", full.names = TRUE)
files_iterator = ifiles(current_dir_files)
parallel_files_iterator = ifiles_parallel(current_dir_files, n_chunks = 4)
it = itoken_parallel(parallel_files_iterator)
dtm = create_dtm(it, hash_vectorizer(2**16), type = 'dgTMatrix')

## End(Not run)
dir_files_iterator = idir(path = ".")

text2vec

Modern Text Mining Framework for R

v0.6
GPL (>= 2) | file LICENSE
Authors
Dmitriy Selivanov [aut, cre, cph], Manuel Bickel [aut, cph] (Coherence measures for topic models), Qing Wang [aut, cph] (Author of the WaprLDA C++ code)
Initial release

We don't support your browser anymore

Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.