Compare vocabulary of a subset of a tCorpus to the rest of the tCorpus
Compare vocabulary of a subset of a tCorpus to the rest of the tCorpus
compare_subset( tc, feature, subset_x = NULL, subset_meta_x = NULL, query_x = NULL, query_feature = "token", smooth = 0.1, min_ratio = NULL, min_chi2 = NULL, yates_cor = c("auto", "yes", "no"), what = c("freq", "docfreq", "cooccurrence") )
tc |
a |
feature |
the column name of the feature that is to be compared |
subset_x |
an expression to subset the tCorpus. The vocabulary of the subset will be compared to the rest of the tCorpus |
subset_meta_x |
like subset_x, but using using the meta data |
query_x |
like subset_x, but using a query search to select documents (see search_contexts) |
query_feature |
if query_x is used, the column name of the feature used in the query search. |
smooth |
Laplace smoothing is used for the calculation of the probabilities. Here you can set the added (pseuocount) value. |
min_ratio |
threshold for the ratio value, which is the ratio of the relative frequency of a term in dtm.x and dtm.y |
min_chi2 |
threshold for the chi^2 value |
yates_cor |
mode for using yates correctsion in the chi^2 calculation. Can be turned on ("yes") or off ("no"), or set to "auto", in which case cochrans rule is used to determine whether yates' correction is used. |
what |
choose whether to compare the frequency ("freq") of terms, or the document frequency ("docfreq"). This also affects how chi^2 is calculated, comparing either freq relative to vocabulary size or docfreq relative to corpus size (N) |
A vocabularyComparison object
tc = create_tcorpus(sotu_texts, doc_column = 'id') tc$preprocess('token', 'feature', remove_stopwords = TRUE, use_stemming = TRUE) comp = compare_subset(tc, 'feature', subset_meta_x = president == 'Barack Obama') comp = comp[order(-comp$chi),] head(comp) plot(comp) comp = compare_subset(tc, 'feature', query_x = 'terroris*') comp = comp[order(-comp$chi),] head(comp, 10)
Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.