Maximum goodness-of-fit fit of univariate continuous distributions
Fit of univariate continuous distribution by maximizing goodness-of-fit (or minimizing distance) for non censored data.
mgedist(data, distr, gof = "CvM", start = NULL, fix.arg = NULL, optim.method = "default", lower = -Inf, upper = Inf, custom.optim = NULL, silent = TRUE, gradient = NULL, checkstartfix=FALSE, ...)
data |
A numeric vector for non censored data. |
distr |
A character string |
gof |
A character string coding for the name of the goodness-of-fit distance used :
|
start |
A named list giving the initial values of parameters of the named distribution
or a function of data computing initial values and returning a named list.
This argument may be omitted (default) for some distributions for which reasonable
starting values are computed (see the 'details' section of |
fix.arg |
An optional named list giving the values of fixed parameters of the named distribution or a function of data computing (fixed) parameter values and returning a named list. Parameters with fixed value are thus NOT estimated. |
optim.method |
|
lower |
Left bounds on the parameters for the |
upper |
Right bounds on the parameters for the |
custom.optim |
a function carrying the optimization. |
silent |
A logical to remove or show warnings when bootstraping. |
gradient |
A function to return the gradient of the gof distance for the |
checkstartfix |
A logical to test starting and fixed values. Do not change it. |
... |
further arguments passed to the |
The mgedist
function numerically maximizes goodness-of-fit,
or minimizes a goodness-of-fit distance coded by the argument
gof
. One may use one of the classical distances defined in Stephens (1986),
the Cramer-von Mises distance ("CvM"
), the
Kolmogorov-Smirnov distance ("KS"
) or the Anderson-Darling distance ("AD"
)
which gives more weight to the tails of the distribution,
or one of the variants of this last distance proposed by Luceno (2006). The right-tail AD ("ADR"
)
gives more weight only to the right tail, the left-tail AD ("ADL"
)
gives more weight only to the left tail. Either of the tails, or both of them, can receive even larger
weights by using second order Anderson-Darling Statistics (using "AD2R"
, "AD2L"
or "AD2"
).
The optimization process is the same as mledist
, see the 'details' section
of that function.
This function is intended to be used only with continuous distributions and weighted maximum goodness-of-fit estimation is not allowed.
NB: if your data values are particularly small or large, a scaling may be needed before the optimization process. See example (4).
mgedist
returns a list with following components,
estimate |
the parameter estimates. |
convergence |
an integer code for the convergence of |
value |
the minimal value reached for the criterion to minimize. |
hessian |
a symmetric matrix computed by |
optim.function |
the name of the optimization function used for maximum likelihood. |
optim.method |
when |
fix.arg |
the named list giving the values of parameters of the named distribution
that must kept fixed rather than estimated by maximum likelihood or |
fix.arg.fun |
the function used to set the value of |
weights |
the vector of weigths used in the estimation process or |
counts |
A two-element integer vector giving the number of calls
to the log-likelihood function and its gradient respectively.
This excludes those calls needed to compute the Hessian, if requested,
and any calls to log-likelihood function to compute a finite-difference
approximation to the gradient. |
optim.message |
A character string giving any additional information
returned by the optimizer, or |
loglik |
the log-likelihood value. |
gof |
the code of the goodness-of-fit distance maximized. |
Marie-Laure Delignette-Muller and Christophe Dutang.
Luceno A (2006), Fitting the generalized Pareto distribution to data using maximum goodness-of-fit estimators. Computational Statistics and Data Analysis, 51, 904-917.
Stephens MA (1986), Tests based on edf statistics. In Goodness-of-fit techniques (D'Agostino RB and Stephens MA, eds), Marcel Dekker, New York, pp. 97-194.
Delignette-Muller ML and Dutang C (2015), fitdistrplus: An R Package for Fitting Distributions. Journal of Statistical Software, 64(4), 1-34.
# (1) Fit of a Weibull distribution to serving size data by maximum # goodness-of-fit estimation using all the distances available # data(groundbeef) serving <- groundbeef$serving mgedist(serving, "weibull", gof="CvM") mgedist(serving, "weibull", gof="KS") mgedist(serving, "weibull", gof="AD") mgedist(serving, "weibull", gof="ADR") mgedist(serving, "weibull", gof="ADL") mgedist(serving, "weibull", gof="AD2R") mgedist(serving, "weibull", gof="AD2L") mgedist(serving, "weibull", gof="AD2") # (2) Fit of a uniform distribution using Cramer-von Mises or # Kolmogorov-Smirnov distance # set.seed(1234) u <- runif(100,min=5,max=10) mgedist(u,"unif",gof="CvM") mgedist(u,"unif",gof="KS") # (3) Fit of a triangular distribution using Cramer-von Mises or # Kolmogorov-Smirnov distance # ## Not run: require(mc2d) set.seed(1234) t <- rtriang(100,min=5,mode=6,max=10) mgedist(t,"triang",start = list(min=4, mode=6,max=9),gof="CvM") mgedist(t,"triang",start = list(min=4, mode=6,max=9),gof="KS") ## End(Not run) # (4) scaling problem # the simulated dataset (below) has particularly small values, hence without scaling (10^0), # the optimization raises an error. The for loop shows how scaling by 10^i # for i=1,...,6 makes the fitting procedure work correctly. set.seed(1234) x2 <- rnorm(100, 1e-4, 2e-4) for(i in 6:0) cat(i, try(mgedist(x*10^i,"cauchy")$estimate, silent=TRUE), "\n")
Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.