languageR: dutchSpeakersDist – R documentation

Become an expert in R — Interactive courses, Cheat Sheets, certificates and more!

dutchSpeakersDist

Cross-entropy based distances between speakers

Description

A distance matrix for the conversations of 165 speakers in the Spoken Dutch Corpus. Metadata on the speakers are available in a separate dataset, dutchSpeakersDistMeta.

Usage

data(dutchSpeakersDist)

Format

A data frame for a 165 by 165 matrix of between-speaker differences.

Source

http://lands.let.kun.nl/cgn/ data collected and analyzed in collaboration with Patrick Juola

References

Juola, P. (2003) The time course of language change, Computers and the Humanities, 37, 77-96.

Juola, P. and Baayen, R. H. (2005) A Controlled-corpus Experiment in Authorship Identification by Cross-entropy, Literary and Linguistic Computing, 20, 59-67.

Examples

## Not run: 
    data(dutchSpeakersDist)
    dutchSpeakersDist.d = as.dist(dutchSpeakersDist)
    dutchSpeakersDist.mds = cmdscale(dutchSpeakersDist.d, k = 3)

    data(dutchSpeakersDistMeta)
    dat = data.frame(dutchSpeakersDist.mds, 
       Sex = dutchSpeakersDistMeta$Sex, 
       Year = dutchSpeakersDistMeta$AgeYear, 
       EduLevel = dutchSpeakersDistMeta$EduLevel)
    dat = dat[!is.na(dat$Year),]

    par(mfrow=c(1,2))
    plot(dat$Year, dat$X1, xlab="year of birth", 
       ylab = "dimension 1", type = "p")
    lines(lowess(dat$Year, dat$X1))
    boxplot(dat$X3 ~ dat$Sex, ylab = "dimension 3")
    par(mfrow=c(1,1))

    cor.test(dat$X1, dat$Year, method="sp")
    t.test(dat$X3~dat$Sex)
	
## End(Not run)

languageR

Analyzing Linguistic Data: A Practical Introduction to Statistics

v1.5.0

GPL (>= 2)

Authors

R. H. Baayen <harald.baayen@uni-tuebingen.de>, Elnaz Shafaei-Bajestan <elnaz.shafaei-bajestan@uni-tuebingen.de>

Initial release

2019-01-28