Become an expert in R — Interactive courses, Cheat Sheets, certificates and more!
Get Started for Free

ngrams

Get N-Grams


Description

Count n-grams, either of words, or of characters.

Usage

ngrams(.Object, ...)

## S4 method for signature 'partition'
ngrams(
  .Object,
  n = 2,
  p_attribute = "word",
  char = NULL,
  progress = FALSE,
  ...
)

## S4 method for signature 'character'
ngrams(
  .Object,
  n = 2,
  p_attribute = "word",
  char = NULL,
  progress = FALSE,
  ...
)

## S4 method for signature 'partition'
ngrams(
  .Object,
  n = 2,
  p_attribute = "word",
  char = NULL,
  progress = FALSE,
  ...
)

## S4 method for signature 'subcorpus'
ngrams(
  .Object,
  n = 2,
  p_attribute = "word",
  char = NULL,
  progress = FALSE,
  ...
)

## S4 method for signature 'character'
ngrams(
  .Object,
  n = 2,
  p_attribute = "word",
  char = NULL,
  progress = FALSE,
  ...
)

## S4 method for signature 'data.table'
ngrams(.Object, n = 2L, p_attribute = "word")

## S4 method for signature 'corpus'
ngrams(
  .Object,
  n = 2,
  p_attribute = "word",
  char = NULL,
  progress = FALSE,
  ...
)

## S4 method for signature 'partition_bundle'
ngrams(
  .Object,
  n = 2,
  char = NULL,
  p_attribute = "word",
  mc = FALSE,
  progress = FALSE,
  ...
)

Arguments

.Object

object of class partition

...

Further arguments.

n

number of tokens/characters

p_attribute

the p-attribute to use (can be > 1)

char

If NULL, tokens will be counted, else characters, keeping only those provided by a character vector

progress

logical

mc

A logical value, whether to use multicore, passed into call to blapply (see respective documentation)

Examples

use("polmineR")
P <- partition("GERMAPARLMINI", date = "2009-10-27")
ngramObject <- ngrams(P, n = 2, p_attribute = "word", char = NULL)

# a more complex scenario: get most frequent ADJA/NN-combinations
ngramObject <- ngrams(P, n = 2, p_attribute = c("word", "pos"), char = NULL)
ngramObject2 <- subset(
 ngramObject,
 ngramObject[["1_pos"]] == "ADJA"  & ngramObject[["2_pos"]] == "NN"
 )
ngramObject2@stat[, "1_pos" := NULL][, "2_pos" := NULL]
ngramObject3 <- sort(ngramObject2, by = "count")
head(ngramObject3)
use("polmineR")
dt <- decode("REUTERS", p_attribute = "word", s_attribute = character(), to = "data.table")
y <- ngrams(dt, n = 3L, p_attribute = "word")

polmineR

Verbs and Nouns for Corpus Analysis

v0.8.5
GPL-3
Authors
Andreas Blaette [aut, cre] (<https://orcid.org/0000-0001-8970-8010>), Christoph Leonhardt [ctb]
Initial release
2020-09-22

We don't support your browser anymore

Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.