Become an expert in R — Interactive courses, Cheat Sheets, certificates and more!
Get Started for Free

lnre_vgc

Expected Vocabulary Growth Curves of LNRE Model (zipfR)


Description

lnre.vgc computes expected vocabulary growth curves E[V(N)] according to a LNRE model, returning an object of class vgc. Data points are returned for the specified values of N, optionally including estimated variances and/or growth curves for the spectrum elements E[V_m(N)].

Usage

lnre.vgc(model, N, m.max=0, variances=FALSE)

Arguments

model

an object belonging to a subclass of lnre, representing a LNRE model

N

an increasing sequence of non-negative integers, specifying the sample sizes N for which vocabulary growth data should be calculated

m.max

if specified, include vocabulary growth curves E[V_m(N)] for spectrum elements up to m.max. Must be a single integer in the range 1 … 9.

variances

if TRUE, include variance estimates for the vocabulary size (and the spectrum elements, if applicable)

Details

~~ TODO, if any ~~

Value

An object of class vgc, representing the expected vocabulary growth curve E[V(N)] of the LNRE model lnre, with data points at the sample sizes N.

If m.max is specified, expected growth curves E[V_m(N)] for spectrum elements (hapax legomena, dis legomena, etc.) up to m.max are also computed.

If variances=TRUE, the vgc object includes variance data for all growth curves.

See Also

vgc for more information about vocabulary growth curves and links to relevant functions; lnre for more information about LNRE models and how to initialize them

Examples

## load Dickens dataset and estimate lnre models
data(Dickens.spc)

zm <- lnre("zm",Dickens.spc)
fzm <- lnre("fzm",Dickens.spc,exact=FALSE)
gigp <- lnre("gigp",Dickens.spc)

## compute expected V and V_1 growth up to 100 million tokens
## in 100 steps of 1 million tokens
zm.vgc <- lnre.vgc(zm,(1:100)*1e6, m.max=1)
fzm.vgc <- lnre.vgc(fzm,(1:100)*1e6, m.max=1)
gigp.vgc <- lnre.vgc(gigp,(1:100)*1e6, m.max=1)

## compare
plot(zm.vgc,fzm.vgc,gigp.vgc,add.m=1,legend=c("ZM","fZM","GIGP"))

## load Italian ultra- prefix data
data(ItaUltra.spc)

## compute zm model
zm <- lnre("zm",ItaUltra.spc)

## compute vgc up to about twice the sample size
## with variance of V
zm.vgc <- lnre.vgc(zm,(1:100)*70, variances=TRUE)

## plot with confidence intervals derived from variance in
## vgc (with larger datasets, ci will typically be almost
## invisible)
plot(zm.vgc)

zipfR

Statistical Models for Word Frequency Distributions

v0.6-70
GPL-3
Authors
Stefan Evert <stefan.evert@fau.de>, Marco Baroni <marco.baroni@unitn.it>
Initial release
2020-10-10

We don't support your browser anymore

Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.