Censored and Truncated Regression with Conditional Heteroscedasticy

Description

Fitting censored (tobit) or truncated regression models with conditional heteroscedasticy.

Usage

crch(formula, data, subset, na.action, weights, offset, 
  link.scale = c("log", "identity", "quadratic"),
  dist = c("gaussian", "logistic", "student"), df = NULL, 
  left = -Inf, right = Inf, truncated = FALSE, 
  type = c("ml", "crps"), control = crch.control(...), 
  model = TRUE, x = FALSE, y = FALSE, ...)

trch(formula, data, subset, na.action, weights, offset, 
  link.scale = c("log", "identity", "quadratic"),
  dist = c("gaussian", "logistic", "student"), df = NULL, 
  left = -Inf, right = Inf, truncated = TRUE, 
  type = c("ml", "crps"), control = crch.control(...), 
  model = TRUE, x = FALSE, y = FALSE, ...)

crch.fit(x, z, y, left, right, truncated = FALSE, dist = "gaussian",
  df = NULL, link.scale = "log", type = "ml", weights = NULL, offset = NULL, 
  control = crch.control()) 

Arguments

formula a formula expression of the form y ~ x | z where y is the response and x and z are regressor variables for the location and the scale of the fitted distribution respectively.
data an optional data frame containing the variables occurring in the formulas.
subset an optional vector specifying a subset of observations to be used for fitting.
na.action a function which indicates what should happen when the data contain NAs.
weights optional case weights in fitting.
offset optional numeric vector with a priori known component to be included in the linear predictor for the location. For crch.fit, offset can also be a list of 2 offsets used for the location and scale respectively.
link.scale character specification of the link function in the scale model. Currently, “identity”, “log”, “quadratic” are supported. The default is “log”. Alternatively, an object of class “link-glm” can be supplied.
dist assumed distribution for the dependent variable y.
df optional degrees of freedom for dist=“student”. If omitted the degrees of freedom are estimated.
left left limit for the censored dependent variable y. If set to -Inf, y is assumed not to be left-censored.
right right limit for the censored dependent variable y. If set to Inf, y is assumed not to be right-censored.
truncated logical. If TRUE truncated model is fitted with left and right interpreted as truncation points, If FALSE censored model is fitted. Default is FALSE
type loss function to be optimized. Can be either “ml” for maximum likelihood (default) or “crps” for minimum continuous ranked probability score (CRPS).
control a list of control parameters passed to optim or to the internal boosting algorithm if control=crch.boost(). Default is crch.control().
model logical. If TRUE model frame is included as a component of the returned value.
x, y for crch: logical. If TRUE the model matrix and response vector used for fitting are returned as components of the returned value. for crch.fit: x is a design matrix with regressors for the location and y is a vector of observations.
z a design matrix with regressors for the scale.
arguments to be used to form the default control argument if it is not supplied directly.

Details

crch fits censored (tobit) or truncated regression models with conditional heteroscedasticy with maximum likelihood estimation. Student-t, Gaussian, and logistic distributions can be fitted to left- and/or right censored or truncated responses. Different regressors can be used to model the location and the scale of this distribution. If control=crch.boost() optimization is performed by boosting.

trch is a wrapper function for crch with default truncated = TRUE.

crch.fit is the lower level function where the actual fitting takes place.

Value

An object of class “crch” or “crch.boost”, i.e., a list with the following elements.

coefficients list of coefficients for location, scale, and df. Scale and df coefficients are in log-scale.
df if dist = “student”: degrees of freedom of student-t distribution. else NULL.
residuals the residuals, that is response minus fitted values.
fitted.values list of fitted location and scale parameters.
dist assumed distribution for the dependent variable y.
cens list of censoring points.
optim output from optimization from optim.
method optimization method used for optim.
type used loss function (maximum likelihood or minimum CRPS).
control list of control parameters passed to optim
start starting values of coefficients used in the optimization.
weights case weights used for fitting.
offset list of offsets for location and scale.
n number of observations.
nobs number of observations with non-zero weights.
loglik log-likelihood.
vcov covariance matrix.
link a list with element “scale” containing the link objects for the scale model.
truncated logical indicating wheter a truncated model has been fitted.
converged logical variable whether optimization has converged or not.
iterations number of iterations in optimization.
call function call.
formula the formula supplied.
terms the terms objects used.
levels list of levels of the factors used in fitting for location and scale respectively.
contrasts (where relevant) the contrasts used.
y if requested, the response used.
x if requested, the model matrix used.
model if requested, the model frame used.
stepsize, mstop, mstopopt, standardize return values of boosting optimization. See crch.boost for details.

References

Messner JW, Mayr GJ, Zeileis A (2016). Heteroscedastic Censored and Truncated Regression with crch. The R Journal, 3(1), 173–181. doi:10.32614/RJ-2016-012

Messner JW, Zeileis A, Broecker J, Mayr GJ (2014). Probabilistic Wind Power Forecasts with an Inverse Power Curve Transformation and Censored Regression. Wind Energy, 17(11), 1753–1766. doi:10.1002/we.1666

See Also

predict.crch, crch.control, crch.boost

Examples

library("crch")

data("RainIbk", package = "crch")
## mean and standard deviation of square root transformed ensemble forecasts
RainIbk$sqrtensmean <- 
  apply(sqrt(RainIbk[,grep('^rainfc',names(RainIbk))]), 1, mean)
RainIbk$sqrtenssd <- 
  apply(sqrt(RainIbk[,grep('^rainfc',names(RainIbk))]), 1, sd)

## fit linear regression model with Gaussian distribution 
CRCH <- crch(sqrt(rain) ~ sqrtensmean, data = RainIbk, dist = "gaussian")
## same as lm?
all.equal(
  coef(lm(sqrt(rain) ~ sqrtensmean, data = RainIbk)),
  head(coef(CRCH), -1),
  tol = 1e-6)
[1] TRUE
## print
CRCH

Call:
crch(formula = sqrt(rain) ~ sqrtensmean, data = RainIbk, dist = "gaussian")

Coefficients (location model):
(Intercept)  sqrtensmean  
     0.1468       0.5817  

Coefficients (scale model with log link):
(Intercept)  
     0.4945  

Distribution: gaussian
## summary
summary(CRCH)

Call:
crch(formula = sqrt(rain) ~ sqrtensmean, data = RainIbk, dist = "gaussian")

Standardized residuals:
    Min      1Q  Median      3Q     Max 
-2.4256 -0.7120 -0.1562  0.5786  4.8408 

Coefficients (location model):
            Estimate Std. Error z value Pr(>|z|)    
(Intercept)  0.14683    0.05456   2.691  0.00713 ** 
sqrtensmean  0.58173    0.01540  37.781  < 2e-16 ***

Coefficients (scale model with log link):
            Estimate Std. Error z value Pr(>|z|)    
(Intercept)  0.49454    0.01003   49.31   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 

Distribution: gaussian
Log-likelihood: -9512 on 3 Df
Number of iterations in BFGS optimization: 3 
## left censored regression model with censoring point 0:
CRCH2 <- crch(sqrt(rain) ~ sqrtensmean, data = RainIbk, 
  dist = "gaussian", left = 0)

## left censored regression model with censoring point 0 and 
## conditional heteroscedasticy:
CRCH3 <- crch(sqrt(rain) ~ sqrtensmean|sqrtenssd, data = RainIbk, 
  dist = "gaussian",  left = 0)

## left censored regression model with censoring point 0 and 
## conditional heteroscedasticy with logistic distribution:
CRCH4 <- crch(sqrt(rain) ~ sqrtensmean|sqrtenssd, data = RainIbk, 
  dist = "logistic", left = 0)

## compare AIC 
AIC(CRCH, CRCH2, CRCH3, CRCH4)
      df      AIC
CRCH   3 19029.75
CRCH2  3 17961.76
CRCH3  4 17914.41
CRCH4  4 17867.35