ismev: game – R documentation

Become an expert in R — Interactive courses, Cheat Sheets, certificates and more!

game

Smooth Parameter Estimation and Bootstrapping of Generalized Pareto Distributions with Penalized Maximum Likelihood Estimation

Description

gamGPDfit() fits the parameters of a generalized Pareto distribution (GPD) depending on covariates in a non- or semiparametric way.

gamGPDboot() fits and bootstraps the parameters of a GPD distribution depending on covariates in a non- or semiparametric way. Applies the post-blackend bootstrap of Chavez-Demoulin and Davison (2005).

Usage

gamGPDfit(x, threshold, nexc=NULL, datvar, xiFrhs, nuFrhs,
          init=gpd.fit(x[, datvar], threshold=threshold, show=FALSE)$mle[2:1],
          niter=32, include.updates=FALSE, epsxi=1e-05, epsnu=1e-05,
          progress=TRUE, verbose=FALSE, ...)
gamGPDboot(x, B, threshold, nexc=NULL, datvar, xiFrhs, nuFrhs,
           init=gpd.fit(x[, datvar], threshold=threshold, show=FALSE)$mle[2:1],
           niter=32, include.updates=FALSE, epsxi=1e-5, epsnu=1e-5,
           boot.progress=TRUE, progress=FALSE, verbose=FALSE, debug=FALSE, ...)

Arguments

`x`	data.frame containing the losses (in some component; can be specified with the argument `datvar`; the other components contain the covariates).
`B`	number of bootstrap replications.
`threshold`	threshold of the peaks-over-threshold (POT) method.
`nexc`	number of excesses. This can be used to determine
`datvar`	name of the data column in `x` which contains the the data to be modeled.
`xiFrhs`	right-hand side of the formula for xi in the `gam()` call for fitting xi.
`nuFrhs`	right-hand side of the formula for nu in the `gam()` call for fitting nu.
`init`	bivariate vector containing initial values for (xi, beta).
`niter`	maximal number of iterations in the backfitting algorithm.
`include.updates`	`logical` indicating whether updates for xi and nu are returned as well (note: this might lead to objects of large size).
`epsxi`	epsilon for stop criterion for xi.
`epsnu`	epsilon for stop criterion for nu.
`boot.progress`	`logical` indicating whether progress information about `gamGPDboot()` is displayed.
`progress`	`logical` indicating whether progress information about `gamGPDfit()` is displayed. For `gamGPDboot()`, `progress` is only passed to `gamGPDfit()` in the case that `boot.progress==TRUE`.
`verbose`	`logical` indicating whether additional information (in case of undesired behavior) is printed. For `gamGPDboot()`, `progress` is only passed to `gamGPDfit()` if `boot.progress==TRUE`.
`debug`	`logical` indicating whether initial fit (before the bootstrap is initiated) is saved.
`...`	additional arguments passed to `gam()` (which is called internally; see the source code of `gamGPDfitUp()`).

Details

gamGPDfit() fits the parameters xi and beta of the generalized Pareto distribution GPD(xi,beta) depending on covariates in a non- or semiparametric way. The distribution function is given by

G[xi,beta](x)=1-(1+xi x/beta)^(-1/xi), x>=0,

for xi>0 (which is what we assume) and beta>0. Note that β is also denoted by σ in this package. Estimation of xi and beta by gamGPDfit() is done via penalized maximum likelihood estimation, where the estimators are computed with a backfitting algorithm. In order to guarantee convergence of this algorithm, a reparameterization of beta in terms of the parameter nu is done via

beta=exp(nu)/(1+xi).

The parameters xi and nu (and thus beta) are allowed to depend on covariates (including time) in a non- or semiparametric way, for example:

xi=xi(x,t)=x^Talpha[xi]+h[xi](t),

nu=nu(x,t)=x^Talpha[nu]+h[nu](t),

where x denotes the vector of covariates, alpha[xi], alpha[nu] are parameter vectors and h[xi], h[nu] are regression splines. For more details, see the references and the source code.

gamGPDboot() first fits the GPD parameters via gamGPDfit(). It then conducts the post-blackend bootstrap of Chavez-Demoulin and Davison (2005). To this end, it computes the residuals, resamples them (B times), reconstructs the corresponding excesses, and refits the GPD parameters via gamGPDfit() again.

Value

gamGPDfit() returns a list with the components

xi:: estimated parameters xi;
beta:: estimated parameters beta;
nu:: estimated parameters nu;
se.xi:: standard error for xi ((possibly adjusted) second-order derivative of the reparameterized log-likelihood with respect to xi) multiplied by -1;
se.nu:: standard error for nu ((possibly adjusted) second-order derivative of the reparameterized log-likelihood with respect to nu) multiplied by -1;
xi.covar:: (unique) covariates for xi;
nu.covar:: (unique) covariates for nu;
covar:: available covariate combinations used for fitting beta(xi, nu);
y:: vector of excesses (exceedances minus threshold);
res:: residuals;
MRD:: mean relative distances between for all iterations, calculated between old parameters (xi, nu) (from the last iteration) and new parameters (currently estimated ones);
logL:: log-likelihood at the estimated parameters;
xiObj:: R object of type gamObject for estimated xi (returned by mgcv::gam());
nuObj:: R object of type gamObject for estimated nu (returned by mgcv::gam());
xiUpdates:: if include.updates is TRUE, updates for xi for each iteration. This is a list of R objects of type gamObject which contains xiObj as last element;
nuUpdates:: if include.updates is TRUE, updates for nu for each iteration. This is a list of R objects of type gamObject which contains nuObj as last element;

gamGPDboot() returns a list of length B+1 where the first component contains the results of the initial fit via gamGPDfit() and the other B components contain the results for each replication of the post-blackend bootstrap.

Author(s)

Marius Hofert, Valerie Chavez-Demoulin.

References

Chavez-Demoulin, V., and Davison, A. C. (2005), Generalized additive models for sample extremes, Applied Statistics 54(1), 207–222.

Chavez-Demoulin, V., and Hofert, M. (to be submitted), Smooth extremal models fitted by penalized maximum likelihood estimation.

Examples

### Example 1: fitting capability ##############################################

## generate an example data set
years <- 2003:2012 # years
nyears <- length(years)
n <- 250 # sample size for each (different) xi
u <- 200 # threshold
rGPD <- function(n, xi, beta) ((1-runif(n))^(-xi)-1)*beta/xi # sampling GPD

set.seed(17) # setting seed
xi.true.A <- seq(0.4, 0.8, length=nyears) # true xi for group "A"
## generate losses for group "A"
lossA <- unlist(lapply(1:nyears,
                       function(y) u + rGPD(n, xi=xi.true.A[y], beta=1)))
xi.true.B <- xi.true.A^2 # true xi for group "B"
## generate losses for group "B"
lossB <- unlist(lapply(1:nyears,
                       function(y) u + rGPD(n, xi=xi.true.B[y], beta=1)))
## build data frame
time <- rep(rep(years, each=n), 2) # "2" stands for the two groups
covar <- rep(c("A","B"), each=n*nyears)
value <- c(lossA, lossB)
x <- data.frame(covar=covar, time=time, value=value)

## fit
eps <- 1e-3 # to decrease the run time for this example
fit <- gamGPDfit(x, threshold=u, datvar="value", xiFrhs=~covar+s(time)-1,
                 nuFrhs=~covar+s(time)-1, epsxi=eps, epsnu=eps)
## note: choosing s(..., bs="cr") will fit cubic splines

## grab the fitted values per group and year
xi.fit <- fitted(fit$xiObj)
xi.fit. <- xi.fit[1+(0:(2*nyears-1))*n] # pick fit for each group and year
xi.fit.A <- xi.fit.[1:nyears] # fit for "A" and each year
xi.fit.B <- xi.fit.[(nyears+1):(2*nyears)] # fit for "B" and each year

## plot the fitted values of xi and the true ones we simulated from
par(mfrow=c(1,2))
plot(years, xi.true.A, type="l", ylim=range(xi.true.A, xi.fit.A),
     main="Group A", xlab="Year", ylab=expression(xi))
points(years, xi.fit.A, type="l", col="red")
legend("topleft", inset=0.04, lty=1, col=c("black", "red"),
       legend=c("true", "fitted"), bty="n")
plot(years, xi.true.B, type="l", ylim=range(xi.true.B, xi.fit.B),
     main="Group B", xlab="Year", ylab=expression(xi))
points(years, xi.fit.B, type="l", col="blue")
legend("topleft", inset=0.04, lty=1, col=c("black", "blue"),
       legend=c("true", "fitted"), bty="n")

## Not run: 
### Example 2: Comparison of (the more general) gamGPDfit() with gpd.fit() ########

set.seed(17) # setting seed
xi.true.A <- rep(0.4, length=nyears)
xi.true.B <- rep(0.8, length=nyears)
## generate losses for group "A"
lossA <- unlist(lapply(1:nyears,
                       function(y) u + rGPD(n, xi=xi.true.A[y], beta=1)))
## generate losses for group "B"
lossB <- unlist(lapply(1:nyears,
                       function(y) u + rGPD(n, xi=xi.true.B[y], beta=1)))
## build data frame
x <- data.frame(covar=covar, time=time, value=c(lossA, lossB))

## fit with gpd.fit
fit.coles <- gpd.fit(x$value, threshold=u, shl=1, sigl=1, ydat=x)
xi.fit.coles.A <- fit.coles$mle[3]+1*fit.coles$mle[4]
xi.fit.coles.B <- fit.coles$mle[3]+2*fit.coles$mle[4]

## fit with gamGPDfit()
fit <- gamGPDfit(x, threshold=u, datvar="value", xiFrhs=~covar, nuFrhs=~covar,
                 epsxi=eps, epsnu=eps)
xi.fit <- fitted(fit$xiObj)
xi.fit.A <- as.numeric(xi.fit[1]) # fit for group "A"
xi.fit.B <- as.numeric(xi.fit[nyears*n+1]) # fit for group "B"

## comparison
xi.fit.A-xi.fit.coles.A
xi.fit.B-xi.fit.coles.B

## End(Not run) # dontrun

ismev

An Introduction to Statistical Modeling of Extreme Values

v1.42

GPL (>= 2)

Authors

Original S functions written by Janet E. Heffernan with R port and R documentation provided by Alec G. Stephenson.

Initial release

2018-05-08