zipfR: lnre_vgc – R documentation

Become an expert in R — Interactive courses, Cheat Sheets, certificates and more!

lnre_vgc

Expected Vocabulary Growth Curves of LNRE Model (zipfR)

Description

lnre.vgc computes expected vocabulary growth curves E[V(N)] according to a LNRE model, returning an object of class vgc. Data points are returned for the specified values of N, optionally including estimated variances and/or growth curves for the spectrum elements E[V_m(N)].

Usage

lnre.vgc(model, N, m.max=0, variances=FALSE)

Arguments

`model`	an object belonging to a subclass of `lnre`, representing a LNRE model
`N`	an increasing sequence of non-negative integers, specifying the sample sizes N for which vocabulary growth data should be calculated
`m.max`	if specified, include vocabulary growth curves E[V_m(N)] for spectrum elements up to `m.max`. Must be a single integer in the range 1 … 9.
`variances`	if `TRUE`, include variance estimates for the vocabulary size (and the spectrum elements, if applicable)

Details

~~ TODO, if any ~~

Value

An object of class vgc, representing the expected vocabulary growth curve E[V(N)] of the LNRE model lnre, with data points at the sample sizes N.

If m.max is specified, expected growth curves E[V_m(N)] for spectrum elements (hapax legomena, dis legomena, etc.) up to m.max are also computed.

If variances=TRUE, the vgc object includes variance data for all growth curves.

Examples

## load Dickens dataset and estimate lnre models
data(Dickens.spc)

zm <- lnre("zm",Dickens.spc)
fzm <- lnre("fzm",Dickens.spc,exact=FALSE)
gigp <- lnre("gigp",Dickens.spc)

## compute expected V and V_1 growth up to 100 million tokens
## in 100 steps of 1 million tokens
zm.vgc <- lnre.vgc(zm,(1:100)*1e6, m.max=1)
fzm.vgc <- lnre.vgc(fzm,(1:100)*1e6, m.max=1)
gigp.vgc <- lnre.vgc(gigp,(1:100)*1e6, m.max=1)

## compare
plot(zm.vgc,fzm.vgc,gigp.vgc,add.m=1,legend=c("ZM","fZM","GIGP"))

## load Italian ultra- prefix data
data(ItaUltra.spc)

## compute zm model
zm <- lnre("zm",ItaUltra.spc)

## compute vgc up to about twice the sample size
## with variance of V
zm.vgc <- lnre.vgc(zm,(1:100)*70, variances=TRUE)

## plot with confidence intervals derived from variance in
## vgc (with larger datasets, ci will typically be almost
## invisible)
plot(zm.vgc)

zipfR

Statistical Models for Word Frequency Distributions

v0.6-70

GPL-3

Authors

Stefan Evert <stefan.evert@fau.de>, Marco Baroni <marco.baroni@unitn.it>

Initial release

2020-10-10