Become an expert in R — Interactive courses, Cheat Sheets, certificates and more!
Get Started for Free

EvertLuedeling2001

Samples of German Word Formation Affixes (zipfR)


Description

Corpus data for measuring the productivity of German word formation affixes -bar, -lich, -sam, -ös, -tum, Klein-, -chen and -lein (Evert & Lüdeling 2001). Data were extracted from two volumes of the German daily newspaper Stuttgarter Zeitung, then manually cleaned and normalized.

Usage

EvertLuedeling2001

Format

A list of 8 character vectors for the different affixes, with names klein (Klein-), bar (-bar), chen (-chen), lein (-lein), lich (-lich), oes (-ös), sam (-sam), tum (-tum).

Each vector contains all relevant tokens from the corpus in their original (chronological) ordering, so vocabulary growth curves can be determined from the vectors in addition to type frequency lists and frequency spectra.

References

Evert, Stefan and Lüdeling, Anke (2001). Measuring morphological productivity: Is automatic preprocessing sufficient? In Proceedings of the Corpus Linguistics 2001 Conference, pages 167–175, Lancaster, UK.

Examples

str(EvertLuedeling2001)

# tokens and type counts for the different affixes
sapply(EvertLuedeling2001, function (x) {
  y <- vec2tfl(x)
  c(N=N(y), V=V(y))
})

zipfR

Statistical Models for Word Frequency Distributions

v0.6-70
GPL-3
Authors
Stefan Evert <stefan.evert@fau.de>, Marco Baroni <marco.baroni@unitn.it>
Initial release
2020-10-10

We don't support your browser anymore

Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.