Smooth Parameter Estimation and Bootstrapping of Generalized Pareto Distributions with Penalized Maximum Likelihood Estimation
gamGPDfit() fits the parameters of a generalized Pareto
distribution (GPD) depending on covariates in a non- or semiparametric
way.
gamGPDboot() fits and bootstraps the parameters of a GPD
distribution depending on covariates in a non- or semiparametric
way. Applies the post-blackend bootstrap of Chavez-Demoulin and
Davison (2005).
gamGPDfit(x, threshold, nexc=NULL, datvar, xiFrhs, nuFrhs,
init=gpd.fit(x[, datvar], threshold=threshold, show=FALSE)$mle[2:1],
niter=32, include.updates=FALSE, epsxi=1e-05, epsnu=1e-05,
progress=TRUE, verbose=FALSE, ...)
gamGPDboot(x, B, threshold, nexc=NULL, datvar, xiFrhs, nuFrhs,
init=gpd.fit(x[, datvar], threshold=threshold, show=FALSE)$mle[2:1],
niter=32, include.updates=FALSE, epsxi=1e-5, epsnu=1e-5,
boot.progress=TRUE, progress=FALSE, verbose=FALSE, debug=FALSE, ...)x |
data.frame containing the losses (in some component; can be
specified with the argument |
B |
number of bootstrap replications. |
threshold |
threshold of the peaks-over-threshold (POT) method. |
nexc |
number of excesses. This can be used to determine |
datvar |
name of the data column in |
xiFrhs |
right-hand side of the formula for xi in
the |
nuFrhs |
right-hand side of the formula for nu in
the |
init |
bivariate vector containing initial values for (xi, beta). |
niter |
maximal number of iterations in the backfitting algorithm. |
include.updates |
|
epsxi |
epsilon for stop criterion for xi. |
epsnu |
epsilon for stop criterion for nu. |
boot.progress |
|
progress |
|
verbose |
|
debug |
|
... |
additional arguments passed to |
gamGPDfit() fits the parameters xi and
beta of the generalized Pareto distribution
GPD(xi,beta) depending on covariates in
a non- or semiparametric way. The distribution function is given by
G[xi,beta](x)=1-(1+xi x/beta)^(-1/xi), x>=0,
for xi>0 (which is what we assume) and
beta>0. Note that β is also denoted by
σ in this package. Estimation of xi
and beta by gamGPDfit() is done via penalized maximum
likelihood estimation, where the estimators are computed with a
backfitting algorithm. In order to guarantee convergence of this
algorithm, a reparameterization of beta in terms of the parameter
nu is done via
beta=exp(nu)/(1+xi).
The parameters xi and nu (and thus beta) are allowed to depend on covariates (including time) in a non- or semiparametric way, for example:
xi=xi(x,t)=x^Talpha[xi]+h[xi](t),
nu=nu(x,t)=x^Talpha[nu]+h[nu](t),
where x denotes the vector of covariates, alpha[xi], alpha[nu] are parameter vectors and h[xi], h[nu] are regression splines. For more details, see the references and the source code.
gamGPDboot() first fits the GPD parameters via
gamGPDfit(). It then conducts the post-blackend bootstrap of
Chavez-Demoulin and Davison (2005). To this end, it computes the
residuals, resamples them (B times), reconstructs the
corresponding excesses, and refits the GPD parameters via
gamGPDfit() again.
gamGPDfit() returns a list with the components
xi:estimated parameters xi;
beta:estimated parameters beta;
nu:estimated parameters nu;
se.xi:standard error for xi ((possibly adjusted) second-order derivative of the reparameterized log-likelihood with respect to xi) multiplied by -1;
se.nu:standard error for nu ((possibly adjusted) second-order derivative of the reparameterized log-likelihood with respect to nu) multiplied by -1;
xi.covar:(unique) covariates for xi;
nu.covar:(unique) covariates for nu;
covar:available covariate combinations used for fitting beta(xi, nu);
y:vector of excesses (exceedances minus threshold);
res:residuals;
MRD:mean relative distances between for all iterations, calculated between old parameters (xi, nu) (from the last iteration) and new parameters (currently estimated ones);
logL:log-likelihood at the estimated parameters;
xiObj:R object of type gamObject for estimated
xi (returned by mgcv::gam());
nuObj:R object of type gamObject for estimated
nu (returned by mgcv::gam());
xiUpdates:if include.updates is
TRUE, updates for xi for each
iteration. This is a list of R objects of type gamObject
which contains xiObj as last element;
nuUpdates:if include.updates is
TRUE, updates for nu for each
iteration. This is a list of R objects of type gamObject
which contains nuObj as last element;
gamGPDboot() returns a list of length B+1 where the
first component contains the results of
the initial fit via gamGPDfit() and the other B
components contain the results for each replication of the
post-blackend bootstrap.
Marius Hofert, Valerie Chavez-Demoulin.
Chavez-Demoulin, V., and Davison, A. C. (2005), Generalized additive models for sample extremes, Applied Statistics 54(1), 207–222.
Chavez-Demoulin, V., and Hofert, M. (to be submitted), Smooth extremal models fitted by penalized maximum likelihood estimation.
### Example 1: fitting capability ##############################################
## generate an example data set
years <- 2003:2012 # years
nyears <- length(years)
n <- 250 # sample size for each (different) xi
u <- 200 # threshold
rGPD <- function(n, xi, beta) ((1-runif(n))^(-xi)-1)*beta/xi # sampling GPD
set.seed(17) # setting seed
xi.true.A <- seq(0.4, 0.8, length=nyears) # true xi for group "A"
## generate losses for group "A"
lossA <- unlist(lapply(1:nyears,
function(y) u + rGPD(n, xi=xi.true.A[y], beta=1)))
xi.true.B <- xi.true.A^2 # true xi for group "B"
## generate losses for group "B"
lossB <- unlist(lapply(1:nyears,
function(y) u + rGPD(n, xi=xi.true.B[y], beta=1)))
## build data frame
time <- rep(rep(years, each=n), 2) # "2" stands for the two groups
covar <- rep(c("A","B"), each=n*nyears)
value <- c(lossA, lossB)
x <- data.frame(covar=covar, time=time, value=value)
## fit
eps <- 1e-3 # to decrease the run time for this example
fit <- gamGPDfit(x, threshold=u, datvar="value", xiFrhs=~covar+s(time)-1,
nuFrhs=~covar+s(time)-1, epsxi=eps, epsnu=eps)
## note: choosing s(..., bs="cr") will fit cubic splines
## grab the fitted values per group and year
xi.fit <- fitted(fit$xiObj)
xi.fit. <- xi.fit[1+(0:(2*nyears-1))*n] # pick fit for each group and year
xi.fit.A <- xi.fit.[1:nyears] # fit for "A" and each year
xi.fit.B <- xi.fit.[(nyears+1):(2*nyears)] # fit for "B" and each year
## plot the fitted values of xi and the true ones we simulated from
par(mfrow=c(1,2))
plot(years, xi.true.A, type="l", ylim=range(xi.true.A, xi.fit.A),
main="Group A", xlab="Year", ylab=expression(xi))
points(years, xi.fit.A, type="l", col="red")
legend("topleft", inset=0.04, lty=1, col=c("black", "red"),
legend=c("true", "fitted"), bty="n")
plot(years, xi.true.B, type="l", ylim=range(xi.true.B, xi.fit.B),
main="Group B", xlab="Year", ylab=expression(xi))
points(years, xi.fit.B, type="l", col="blue")
legend("topleft", inset=0.04, lty=1, col=c("black", "blue"),
legend=c("true", "fitted"), bty="n")
## Not run:
### Example 2: Comparison of (the more general) gamGPDfit() with gpd.fit() ########
set.seed(17) # setting seed
xi.true.A <- rep(0.4, length=nyears)
xi.true.B <- rep(0.8, length=nyears)
## generate losses for group "A"
lossA <- unlist(lapply(1:nyears,
function(y) u + rGPD(n, xi=xi.true.A[y], beta=1)))
## generate losses for group "B"
lossB <- unlist(lapply(1:nyears,
function(y) u + rGPD(n, xi=xi.true.B[y], beta=1)))
## build data frame
x <- data.frame(covar=covar, time=time, value=c(lossA, lossB))
## fit with gpd.fit
fit.coles <- gpd.fit(x$value, threshold=u, shl=1, sigl=1, ydat=x)
xi.fit.coles.A <- fit.coles$mle[3]+1*fit.coles$mle[4]
xi.fit.coles.B <- fit.coles$mle[3]+2*fit.coles$mle[4]
## fit with gamGPDfit()
fit <- gamGPDfit(x, threshold=u, datvar="value", xiFrhs=~covar, nuFrhs=~covar,
epsxi=eps, epsnu=eps)
xi.fit <- fitted(fit$xiObj)
xi.fit.A <- as.numeric(xi.fit[1]) # fit for group "A"
xi.fit.B <- as.numeric(xi.fit[nyears*n+1]) # fit for group "B"
## comparison
xi.fit.A-xi.fit.coles.A
xi.fit.B-xi.fit.coles.B
## End(Not run) # dontrunPlease choose more modern alternatives, such as Google Chrome or Mozilla Firefox.