Smooth Parameter Estimation and Bootstrapping of Generalized Pareto Distributions with Penalized Maximum Likelihood Estimation
gamGPDfit()
fits the parameters of a generalized Pareto
distribution (GPD) depending on covariates in a non- or semiparametric
way.
gamGPDboot()
fits and bootstraps the parameters of a GPD
distribution depending on covariates in a non- or semiparametric
way. Applies the post-blackend bootstrap of Chavez-Demoulin and
Davison (2005).
gamGPDfit(x, threshold, nexc=NULL, datvar, xiFrhs, nuFrhs, init=gpd.fit(x[, datvar], threshold=threshold, show=FALSE)$mle[2:1], niter=32, include.updates=FALSE, epsxi=1e-05, epsnu=1e-05, progress=TRUE, verbose=FALSE, ...) gamGPDboot(x, B, threshold, nexc=NULL, datvar, xiFrhs, nuFrhs, init=gpd.fit(x[, datvar], threshold=threshold, show=FALSE)$mle[2:1], niter=32, include.updates=FALSE, epsxi=1e-5, epsnu=1e-5, boot.progress=TRUE, progress=FALSE, verbose=FALSE, debug=FALSE, ...)
x |
data.frame containing the losses (in some component; can be
specified with the argument |
B |
number of bootstrap replications. |
threshold |
threshold of the peaks-over-threshold (POT) method. |
nexc |
number of excesses. This can be used to determine |
datvar |
name of the data column in |
xiFrhs |
right-hand side of the formula for xi in
the |
nuFrhs |
right-hand side of the formula for nu in
the |
init |
bivariate vector containing initial values for (xi, beta). |
niter |
maximal number of iterations in the backfitting algorithm. |
include.updates |
|
epsxi |
epsilon for stop criterion for xi. |
epsnu |
epsilon for stop criterion for nu. |
boot.progress |
|
progress |
|
verbose |
|
debug |
|
... |
additional arguments passed to |
gamGPDfit()
fits the parameters xi and
beta of the generalized Pareto distribution
GPD(xi,beta) depending on covariates in
a non- or semiparametric way. The distribution function is given by
G[xi,beta](x)=1-(1+xi x/beta)^(-1/xi), x>=0,
for xi>0 (which is what we assume) and
beta>0. Note that β is also denoted by
σ in this package. Estimation of xi
and beta by gamGPDfit()
is done via penalized maximum
likelihood estimation, where the estimators are computed with a
backfitting algorithm. In order to guarantee convergence of this
algorithm, a reparameterization of beta in terms of the parameter
nu is done via
beta=exp(nu)/(1+xi).
The parameters xi and nu (and thus beta) are allowed to depend on covariates (including time) in a non- or semiparametric way, for example:
xi=xi(x,t)=x^Talpha[xi]+h[xi](t),
nu=nu(x,t)=x^Talpha[nu]+h[nu](t),
where x denotes the vector of covariates, alpha[xi], alpha[nu] are parameter vectors and h[xi], h[nu] are regression splines. For more details, see the references and the source code.
gamGPDboot()
first fits the GPD parameters via
gamGPDfit()
. It then conducts the post-blackend bootstrap of
Chavez-Demoulin and Davison (2005). To this end, it computes the
residuals, resamples them (B
times), reconstructs the
corresponding excesses, and refits the GPD parameters via
gamGPDfit()
again.
gamGPDfit()
returns a list with the components
xi
:estimated parameters xi;
beta
:estimated parameters beta;
nu
:estimated parameters nu;
se.xi
:standard error for xi ((possibly adjusted) second-order derivative of the reparameterized log-likelihood with respect to xi) multiplied by -1;
se.nu
:standard error for nu ((possibly adjusted) second-order derivative of the reparameterized log-likelihood with respect to nu) multiplied by -1;
xi.covar
:(unique) covariates for xi;
nu.covar
:(unique) covariates for nu;
covar
:available covariate combinations used for fitting beta(xi, nu);
y
:vector of excesses (exceedances minus threshold);
res
:residuals;
MRD
:mean relative distances between for all iterations, calculated between old parameters (xi, nu) (from the last iteration) and new parameters (currently estimated ones);
logL
:log-likelihood at the estimated parameters;
xiObj
:R object of type gamObject
for estimated
xi (returned by mgcv::gam()
);
nuObj
:R object of type gamObject
for estimated
nu (returned by mgcv::gam()
);
xiUpdates
:if include.updates
is
TRUE
, updates for xi for each
iteration. This is a list of R objects of type gamObject
which contains xiObj
as last element;
nuUpdates
:if include.updates
is
TRUE
, updates for nu for each
iteration. This is a list of R objects of type gamObject
which contains nuObj
as last element;
gamGPDboot()
returns a list of length B+1
where the
first component contains the results of
the initial fit via gamGPDfit()
and the other B
components contain the results for each replication of the
post-blackend bootstrap.
Marius Hofert, Valerie Chavez-Demoulin.
Chavez-Demoulin, V., and Davison, A. C. (2005), Generalized additive models for sample extremes, Applied Statistics 54(1), 207–222.
Chavez-Demoulin, V., and Hofert, M. (to be submitted), Smooth extremal models fitted by penalized maximum likelihood estimation.
### Example 1: fitting capability ############################################## ## generate an example data set years <- 2003:2012 # years nyears <- length(years) n <- 250 # sample size for each (different) xi u <- 200 # threshold rGPD <- function(n, xi, beta) ((1-runif(n))^(-xi)-1)*beta/xi # sampling GPD set.seed(17) # setting seed xi.true.A <- seq(0.4, 0.8, length=nyears) # true xi for group "A" ## generate losses for group "A" lossA <- unlist(lapply(1:nyears, function(y) u + rGPD(n, xi=xi.true.A[y], beta=1))) xi.true.B <- xi.true.A^2 # true xi for group "B" ## generate losses for group "B" lossB <- unlist(lapply(1:nyears, function(y) u + rGPD(n, xi=xi.true.B[y], beta=1))) ## build data frame time <- rep(rep(years, each=n), 2) # "2" stands for the two groups covar <- rep(c("A","B"), each=n*nyears) value <- c(lossA, lossB) x <- data.frame(covar=covar, time=time, value=value) ## fit eps <- 1e-3 # to decrease the run time for this example fit <- gamGPDfit(x, threshold=u, datvar="value", xiFrhs=~covar+s(time)-1, nuFrhs=~covar+s(time)-1, epsxi=eps, epsnu=eps) ## note: choosing s(..., bs="cr") will fit cubic splines ## grab the fitted values per group and year xi.fit <- fitted(fit$xiObj) xi.fit. <- xi.fit[1+(0:(2*nyears-1))*n] # pick fit for each group and year xi.fit.A <- xi.fit.[1:nyears] # fit for "A" and each year xi.fit.B <- xi.fit.[(nyears+1):(2*nyears)] # fit for "B" and each year ## plot the fitted values of xi and the true ones we simulated from par(mfrow=c(1,2)) plot(years, xi.true.A, type="l", ylim=range(xi.true.A, xi.fit.A), main="Group A", xlab="Year", ylab=expression(xi)) points(years, xi.fit.A, type="l", col="red") legend("topleft", inset=0.04, lty=1, col=c("black", "red"), legend=c("true", "fitted"), bty="n") plot(years, xi.true.B, type="l", ylim=range(xi.true.B, xi.fit.B), main="Group B", xlab="Year", ylab=expression(xi)) points(years, xi.fit.B, type="l", col="blue") legend("topleft", inset=0.04, lty=1, col=c("black", "blue"), legend=c("true", "fitted"), bty="n") ## Not run: ### Example 2: Comparison of (the more general) gamGPDfit() with gpd.fit() ######## set.seed(17) # setting seed xi.true.A <- rep(0.4, length=nyears) xi.true.B <- rep(0.8, length=nyears) ## generate losses for group "A" lossA <- unlist(lapply(1:nyears, function(y) u + rGPD(n, xi=xi.true.A[y], beta=1))) ## generate losses for group "B" lossB <- unlist(lapply(1:nyears, function(y) u + rGPD(n, xi=xi.true.B[y], beta=1))) ## build data frame x <- data.frame(covar=covar, time=time, value=c(lossA, lossB)) ## fit with gpd.fit fit.coles <- gpd.fit(x$value, threshold=u, shl=1, sigl=1, ydat=x) xi.fit.coles.A <- fit.coles$mle[3]+1*fit.coles$mle[4] xi.fit.coles.B <- fit.coles$mle[3]+2*fit.coles$mle[4] ## fit with gamGPDfit() fit <- gamGPDfit(x, threshold=u, datvar="value", xiFrhs=~covar, nuFrhs=~covar, epsxi=eps, epsnu=eps) xi.fit <- fitted(fit$xiObj) xi.fit.A <- as.numeric(xi.fit[1]) # fit for group "A" xi.fit.B <- as.numeric(xi.fit[nyears*n+1]) # fit for group "B" ## comparison xi.fit.A-xi.fit.coles.A xi.fit.B-xi.fit.coles.B ## End(Not run) # dontrun
Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.