Brown Corpus Frequency Data (zipfR)
These data were extracted from the Brown corpus (see Kucera and Francis 1967).
Brown.emp.vgc
is the empirical vocabulary growth
curve, reflecting the V
and V(1)
development in the
non-randomized corpus.
We removed numbers and other forms of non-linguistic material before collecting word counts from the Brown.
Kucera, H. and Francis, W.N. (1967). Computational analysis of present-day American English. Brown University Press, Providence.
The datasets documented in BrownSubsets
pertain to
various subsets of the Brown (e.g., informative prose, adjectives
only, etc.)
data(Brown.tfl) summary(Brown.tfl) data(Brown.spc) summary(Brown.spc) data(Brown.emp.vgc) summary(Brown.emp.vgc)
Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.