Auxiliary Functions for Boosting crch Models

Description

Auxiliary functions to fit crch models via boosting

Usage

crch.boost(maxit = 100, nu = 0.1, start = NULL, dot = "separate", 
  mstop = c("max", "aic", "bic", "cv"),  nfolds = 10, foldid = NULL, 
  maxvar = NULL)

crch.boost.fit(x, z, y, left, right, truncated = FALSE, dist = "gaussian",
  df = NULL, link.scale = "log", type = "ml", weights = NULL, offset = NULL, 
  control = crch.boost())

Arguments

maxit the maximum number of boosting iterations.
nu boosting step size. Default is 0.1.
start a previously boosted but not converged “crch.boost” object to continue.
dot character specifying how to process formula parts with a dot (.) on the right-hand side. This can either be “separate” so that each formula part is expanded separately or “sequential” so that the parts are expanded sequentially conditional on all prior parts. Default is “separate”
mstop method to find optimum stopping iteration. Default is “max” which is maxit. Alternatives are “aic” and “bic” for AIC and BIC optimization and “cv” for cross validation. mstop can also be a positive integer to set the number of boosting iterations. Then maxit is set to mstop and mstop=“max”.
nfolds if mstopopt = “cv”, number of folds in cross validation.
foldid if mstopopt = “cv”, an optional vector of values between 1 and nfold identifying the fold each observation is in. If supplied, nfolds can be missing.
maxvar Positive numeric. Maximum number of parameters to be selected during each iteration (not including intercepts). Used for stability selection.
x, z, y, left, right, truncated, dist, df, link.scale, type, weights, offset, control see crch.fit for details.

Details

crch.boost extends crch to fit censored (tobit) or truncated regression models with conditional heteroscedasticy by boosting. If crch.boost() is supplied as control in crch then crch.boost.fit is used as lower level fitting function. Note that crch.control() with method=boosting is equivalent to crch.boost(). Thus, boosting can more conveniently be called with crch(…, method = “boosting”).

Value

For crch.boost: A list with components named as the arguments. For crch.boost.fit: An object of class “crch.boost”, i.e., a list with the following elements.

coefficients list of coefficients for location and scale. Scale coefficients are in log-scale. Coefficients are of optimum stopping stopping iteration specified by mstop.
df if dist = “student”: degrees of freedom of student-t distribution. else NULL.
residuals the residuals, that is response minus fitted values.
fitted.values list of fitted location and scale parameters at optimum stopping iteration specified by mstop.
dist assumed distribution for the dependent variable y.
cens list of censoring points.
control list of control parameters.
weights case weights used for fitting.
offset list of offsets for location and scale.
n number of observations.
nobs number of observations with non-zero weights.
loglik log-likelihood.
link a list with element “scale” containing the link objects for the scale model.
truncated logical indicating wheter a truncated model has been fitted.
iterations number of boosting iterations.
stepsize boosting stepsize nu.
mstop criterion used to find optimum stopping iteration.
mstopopt optimum stopping iterations for different criteria.
standardize list of center and scale values used to standardize response and regressors.

References

Messner JW, Mayr GJ, Zeileis A (2017). Non-Homogeneous Boosting for Predictor Selection in Ensemble Post-Processing. Monthly Weather Review, 145(1), 137–147. doi:10.1175/MWR-D-16-0088.1

See Also

crch, crch.control

Examples

library("crch")

# generate data
suppressWarnings(RNGversion("3.5.0"))
set.seed(5)
x <- matrix(rnorm(1000*20),1000,20)
y <- rnorm(1000, 1 + x[,1] - 1.5 * x[,2], exp(-1 + 0.3*x[,3]))
y <- pmax(0, y)
data <- data.frame(cbind(y, x))

# fit model with maximum likelihood
CRCH <- crch(y ~ .|., data = data, dist = "gaussian", left = 0)

# fit model with boosting
boost <- crch(y ~ .|.,  data = data, dist = "gaussian", left = 0,
  control = crch.boost(mstop = "aic"))

# more conveniently, the same model can also be fit through
# boost <- crch(y ~ .|.,  data = data, dist = "gaussian", left = 0,
#   method = "boosting", mstop = "aic")

# AIC comparison
AIC(CRCH, boost)
      df      AIC
CRCH  42 819.2673
boost  7 782.1219
# summary
summary(boost)

Call:
crch(formula = y ~ . | ., data = data, dist = "gaussian", left = 0, control = crch.boost(mstop = "aic"))

Standardized residuals:
    Min      1Q  Median      3Q     Max 
-2.9273 -0.2963  0.5462  1.4357 22.6650 

maximum stopping iteration: 100 

optimum stopping iterations:
max aic bic 
100  90  90 

Non-zero coefficients after 90 boosting iterations:
Location model:
(Intercept)           V2           V3          V13          V21  
    0.99111      1.01795     -1.49731      0.04483      0.03668  

Scale model with log link:
(Intercept)           V4  
    -0.9183       0.2226  

Distribution: gaussian
Log-likelihood: -384.1 on 7 Df
# plot
plot(boost)