Frequency distribution tables, histograms and polygons
The fdth package contains a set of functions which easily allows
the user to make frequency distribution tables (fdt), its associated
histograms and frequency polygons (absolute, relative and cumulative).
The fdt can be formatted in many ways which may be suited to
publication in many different ways (papers, books, etc).
The plot
method (S3) is the histogram which can be dealt with the
easiness and flexibility of a high level function.
The frequency of a particular observation is the number of times the observation occurs in the data. The distribution of a variable is the pattern of frequencies of the observation.
Frequency distribution table fdt can be used for ordinal, continuous and categorical variables.
The R
environment provides a set of functions (generally low level)
enabling the user to perform a fdt and the associated graphical representation,
the histogram. A fdt plays an important role to summarize data information and
is the basis for the estimation of probability density function used in
parametrical inference.
However, for novices or ocasional users of R
, it can be laborious to
find out all necessary functions and graphical parameters to do a normalized
and pretty fdt and the associated histogram ready for publications.
That is the aim of this package, i.e, to allow the user easily and flexibly to do
both: the fdt and the histogram. The most common input data for univariated is
a vector
. For multivariated data can be used both: a data.frame
,
in this case also allowing grouping all numerical variables according to one
categorical, or matrices
.
The simplest way to run fdt and fdt_cat is by supplying only the x
object, for example: d <- fdt(x)
. In this case all necessary
default values (breaks and right) ("Sturges" and FALSE
respectively) will be used, if the x object is categorical then just use
d <- fdt_cat(x)
.
If the varable is of contiuos type, you can also supply:
x and k (number of class intervals);
x, start (left endpoint of the first class interval) and end (right endpoint of the last class interval); or
x, start, end and h (class interval width).
These options make the fdt very easy and flexible.
The fdt and fdt_cat object store information to be used by methods summary
,
print
and plot
. The result of plot
is a histogram or
polygon (absolute, relative or cumulative).
The methods summary
, print
and plot
provide a reasonable
set of parameters to format and plot the fdt object in a pretty
(and publishable) way.
José Cláudio Faria
Enio G. Jelihovschi
Ivan B. Allaman
library (fdth) ## Numerical #====================== # Vectors: univariated #====================== x <- rnorm(n=1e3, mean=5, sd=1) (tb <- fdt(x)) # Histograms plot(tb) # Absolute frequency histogram plot(tb, main='My title') plot(tb, x.round=3, col='darkgreen') plot(tb, xlas=2) plot(tb, x.round=3, xlas=2, xlab=NULL) plot(tb, v=TRUE, cex=.8, x.round=3, xlas=2, xlab=NULL, col=rainbow(11)) plot(tb, type='fh') # Absolute frequency histogram plot(tb, type='rfh') # Relative frequency histogram plot(tb, type='rfph') # Relative frequency (%) histogram plot(tb, type='cdh') # Cumulative density histogram plot(tb, type='cfh') # Cumulative frequency histogram plot(tb, type='cfph') # Cumulative frequency (%) histogram # Polygons plot(tb, type='fp') # Absolute frequency polygon plot(tb, type='rfp') # Relative frequency polygon plot(tb, type='rfpp') # Relative frequency (%) polygon plot(tb, type='cdp') # Cumulative density polygon plot(tb, type='cfp') # Cumulative frequency polygon plot(tb, type='cfpp') # Cumulative frequency (%) polygon # Density plot(tb, type='d') # Density # Summary tb summary(tb) # the same print(tb) # the same show(tb) # the same summary(tb, format=TRUE) # It can not be what you want to publications! summary(tb, format=TRUE, pattern='%.2f') # Huumm ..., good, but ... Can it be better? summary(tb, col=c(1:2, 4, 6), format=TRUE, pattern='%.2f') # Yes, it can! range(x) # To know x summary(fdt(x, start=1, end=9, h=1), col=c(1:2, 4, 6), format=TRUE, pattern='%d') # Is it nice now? # The fdt.object tb[['table']] # Stores the feq. dist. table (fdt) tb[['breaks']] # Stores the breaks of fdt tb[['breaks']]['start'] # Stores the left value of the first class tb[['breaks']]['end'] # Stores the right value of the last class tb[['breaks']]['h'] # Stores the class interval as.logical(tb[['breaks']]['right']) # Stores the right option # Theoretical curve and fdt y <- rnorm(1e5, mean=5, sd=1) tb <- fdt(y, k=100) plot(tb, type='d', # density col=heat.colors(100)) curve(dnorm(x, mean=5, sd=1), n=1e3, add=TRUE, lwd=4) #============================================= # Data.frames: multivariated with categorical #============================================= mdf <- data.frame(X1=rep(LETTERS[1:4], 25), X2=as.factor(rep(1:10, 10)), Y1=c(NA, NA, rnorm(96, 10, 1), NA, NA), Y2=rnorm(100, 60, 4), Y3=rnorm(100, 50, 4), Y4=rnorm(100, 40, 4), stringsAsFactors=TRUE) (tb <- fdt(mdf)) # Histograms plot(tb, v=TRUE) plot(tb, col=rainbow(8)) plot(tb, type='fh') plot(tb, type='rfh') plot(tb, type='rfph') plot(tb, type='cdh') plot(tb, type='cfh') plot(tb, type='cfph') # Poligons plot(tb, v=TRUE, type='fp') plot(tb, type='rfp') plot(tb, type='rfpp') plot(tb, type='cdp') plot(tb, type='cfp') plot(tb, type='cfpp') # Density plot(tb, type='d') # Summary tb summary(tb) # the same print(tb) # the same show(tb) # the same summary(tb, format=TRUE) summary(tb, format=TRUE, pattern='%05.2f') # regular expression summary(tb, col=c(1:2, 4, 6), format=TRUE, pattern='%05.2f') print(tb, col=c(1:2, 4, 6)) print(tb, col=c(1:2, 4, 6), format=TRUE, pattern='%05.2f') # Using by levels(mdf$X1) plot(fdt(mdf, k=5, by='X1'), col=rainbow(5)) levels(mdf$X2) summary(fdt(iris, k=5), format=TRUE, patter='%04.2f') plot(fdt(iris, k=5), col=rainbow(5)) levels(iris$Species) summary(fdt(iris, k=5, by='Species'), format=TRUE, patter='%04.2f') plot(fdt(iris, k=5, by='Species'), v=TRUE) #========================= # Matrices: multivariated #========================= summary(fdt(state.x77), col=c(1:2, 4, 6), format=TRUE) plot(fdt(state.x77)) # Very big summary(fdt(volcano, right=TRUE), col=c(1:2, 4, 6), round=3, format=TRUE, pattern='%05.1f') plot(fdt(volcano, right=TRUE)) ## Categorical x <- sample(x=letters[1:5], size=5e2, rep=TRUE) (fdt.c <- fdt_cat(x)) (fdt.c <- fdt_cat(x, sort=FALSE)) ##================================================ ## Data.frame: multivariated with two categorical ##================================================ mdf <- data.frame(c1=sample(LETTERS[1:3], 1e2, rep=TRUE), c2=as.factor(sample(1:10, 1e2, rep=TRUE)), n1=c(NA, NA, rnorm(96, 10, 1), NA, NA), n2=rnorm(100, 60, 4), n3=rnorm(100, 50, 4), stringsAsFactors=TRUE) head(mdf) (fdt.c <- fdt_cat(mdf)) (fdt.c <- fdt_cat(mdf, dec=FALSE)) (fdt.c <- fdt_cat(mdf, sort=FALSE)) (fdt.c <- fdt_cat(mdf, by='c1')) ##================================================ ## Matrix: two categorical ##================================================ x <- matrix(sample(x=letters[1:10], size=100, rep=TRUE), nc=2, dimnames=list(NULL, c('c1', 'c2'))) head(x) (fdt.c <- fdt_cat(x))
Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.