Split a character string or corpus into segments
Split a character string or corpus into segments, taking into account punctuation where possible
split_segments(obj, segment_size = 40, segment_size_window = NULL) ## S3 method for class 'character' split_segments(obj, segment_size = 40, segment_size_window = NULL) ## S3 method for class 'Corpus' split_segments(obj, segment_size = 40, segment_size_window = NULL) ## S3 method for class 'corpus' split_segments(obj, segment_size = 40, segment_size_window = NULL)
obj |
character string, quanteda or tm corpus object |
segment_size |
segment size (in words) |
segment_size_window |
window around segment size to look for best splitting point |
If obj is a tm or quanteda corpus object, the result is a quanteda corpus.
require(quanteda) split_segments(data_corpus_inaugural)
Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.