zipfR: read_write_vgc – R documentation

Pricing

Become an expert in R — Interactive courses, Cheat Sheets, certificates and more!

Get Started for Free

Documentation

zipfR

read_write_vgc

Loading and Saving Vocabulary Growth Curves (zipfR)

Description

read.vgc loads vocabulary growth data from .vgc file

write.vgc saves vocabulary growth data in .vgc file

Usage

read.vgc(file)

  write.vgc(vgc, file)

Arguments

`file`	character string specifying the pathname of a disk file. Files with extension `.gz` will automatically be compressed/decompressed. See section "Format" for a description of the required file format
`vgc`	a vocabulary growth curve, i.e.\ an object of class `vgc`

Format

A TAB-delimited text file with column headers but no row names (suitable for reading with read.delim). The file must contain at least the following two columns:

N: increasing integer vector of sample sizes N
V: corresponding observed vocabulary sizes V(N) or expected vocabulary sizes E[V(N)]

Optionally, columns V1, ..., V9 can be added to specify the number of hapaxes (V_1(N)), dis legomena (V_2(N)), and further spectrum elements up to V_9(N).

It is not necessary to include all 9 columns, but for any V_m(N) in the data set, all "lower" spectrum elements V_{m'}(N) (for m' < m) must also be present. For example, it is valid to have columns V1 V2 V3, but not V1 V3 V5 or V2 V3 V4.

Variances for expected vocabulary sizes and spectrum elements can be given in further columns VV (for Var[V(N)]), and VV1, ..., VV9 (for Var[V_m(N)]). VV is mandatory in this case, and columns VVm must be specified for exactly the same frequency classes m as the Vm above.

These columns may appear in any order in the text file. All other columns will be silently ignored.

Details

If the filename file ends in the extension .gz, .bz2 or .xz, the disk file will automatically be decompressed (read.vgc) or compressed (write.vgc).

Value

read.vgc returns an object of class vgc (see the vgc manpage for details)

Examples

## save Italian ultra- prefix VGC to external text file
fname <- tempfile(fileext=".vgc")
write.vgc(ItaUltra.emp.vgc, fname)
## now <fname> is a TAB-delimited text file with columns N, V and V1

## we ready it back in
New.vgc <- read.vgc(fname)

## same vgc as ItaUltra.emp.vgc, compare:
summary(New.vgc)
summary(ItaUltra.emp.vgc)
head(New.vgc)
head(ItaUltra.emp.vgc)

stopifnot(isTRUE(all.equal(New.vgc, ItaUltra.emp.vgc))) # should be identical

zipfR

Statistical Models for Word Frequency Distributions

v0.6-70

GPL-3

Authors

Stefan Evert <stefan.evert@fau.de>, Marco Baroni <marco.baroni@unitn.it>

Initial release

2020-10-10

read_write_vgc

Description

Usage

Arguments

Format

Details

Value

See Also

Examples

zipfR

We don't support your browser anymore