Zero-Truncated Count Data Regression

Description

Fit zero-truncated regression models for count data via maximum likelihood.

Usage

zerotrunc(formula, data, subset, na.action, weights, offset,
  dist = c("poisson", "negbin", "geometric"), theta = Inf,
  control = zerotrunc.control(...),
  model = TRUE, y = TRUE, x = FALSE, ...)

Arguments

formula symbolic description of the model.
data, subset, na.action arguments controlling formula processing via model.frame.
weights optional numeric vector of weights.
offset optional numeric vector with an a priori known component to be included in the linear predictor.
dist character specification of the count distribution family.
theta numeric. Alternative (and more flexible) specification of the count distribution family. Some values correspond to dist values: theta = Inf (“poisson”), theta = 1 (“geometric”), theta = NULL (“negbin”). But every non-negative value for theta is allowed. When theta is given, dist must not be specified and vice versa.
control a list of control arguments specified via zerotrunc.control.
model, y, x logicals. If TRUE the corresponding components of the fit (model frame, response, model matrix) are returned.
arguments passed to zerotrunc.control in the default setup.

Details

All zero-truncated count data models in zerotrunc are obtained from the corresponding untruncated distribution using a log-link between the mean of the untruncated distribution and the linear predictor. All parameters are estimated by maximum likelihood using optim, with control options set in zerotrunc.control. Starting values can be supplied, otherwise they are estimated by glm.fit (the default). Standard errors are derived numerically using the Hessian matrix returned by optim. See zerotrunc.control for details.

The returned fitted model object is of class “zerotrunc” and is similar to fitted “glm” objects.

A set of standard extractor functions for fitted model objects is available for objects of class “zerotrunc”, including methods to the generic functions print, summary, coef, vcov, logLik, residuals, predict, fitted, terms, model.frame, model.matrix. See predict.zerotrunc for more details on all methods.

Value

An object of class “zerotrunc”, i.e., a list with components including

coefficients estimated coefficients,
residuals a vector of raw residuals (observed - fitted),
fitted.values a vector of fitted means,
optim a list with the output from the optim call for minimizing the negative log-likelihood,
control the control arguments passed to the optim call,
start the starting values for the parameters passed to the optim call(s),
weights the case weights used (if any),
offset the offset vector used (if any),
n number of observations,
df.null residual degrees of freedom for the null model,
df.residual residual degrees of freedom for fitted model,
terms terms objects for the model,
theta (estimated) \(\theta\) parameter of the negative binomial model,
SE.logtheta standard error for \(\log(\theta)\),
loglik log-likelihood of the fitted model,
vcov covariance matrix of the coefficients in the model (derived from the Hessian of the optim output),
dist character describing the distribution used,
converged logical indicating successful convergence of optim,
call the original function call,
formula the original formula,
levels levels of the categorical regressors,
contrasts contrasts corresponding to levels from the model,
model the model frame (if model = TRUE),
y the response count vector (if y = TRUE),
x model matrix (if x = TRUE).

References

Cameron AC, Trivedi PK (2013). Regression Analysis of Count Data, 2nd ed. New York: Cambridge University Press.

Zeileis A, Kleiber C, Jackman S (2008). “Regression Models for Count Data in R.” Journal of Statistical Software, 27(8), 1–25. doi:10.18637/jss.v027.i08.

See Also

zerotrunc.control, glm, glm.fit, glm.nb, zeroinfl, hurdle

Examples

library("countreg")

## data
data("CrabSatellites", package = "countreg")
cs <- CrabSatellites[, c("satellites", "width", "color")]
cs$color <- as.numeric(cs$color)
cs <- subset(cs, subset = satellites > 0)

## poisson
zt_p <- zerotrunc(satellites ~ ., data = cs)
## or equivalently
zt_p <- zerotrunc(satellites ~ ., data = cs, theta = Inf)
summary(zt_p)

Call:
zerotrunc(formula = satellites ~ ., data = cs, theta = Inf)

Deviance residuals:
    Min      1Q  Median      3Q     Max 
-2.5409 -0.9350 -0.2051  0.6278  3.7722 

Coefficients (truncated poisson with log link):
            Estimate Std. Error z value Pr(>|z|)
(Intercept) 0.562699   0.645439   0.872    0.383
width       0.034238   0.022227   1.540    0.123
color       0.007166   0.066627   0.108    0.914

Number of iterations in BFGS optimization: 6 
Log-likelihood: -267.5 on 3 Df
## negbin
zt_nb <- zerotrunc(satellites ~ ., data = cs, dist = "negbin")
## or equivalently
zt_nb <- zerotrunc(satellites ~ ., data = cs, theta = NULL)
summary(zt_nb)

Call:
zerotrunc(formula = satellites ~ ., data = cs, theta = NULL)

Deviance residuals:
    Min      1Q  Median      3Q     Max 
-2.1636 -0.7158 -0.1520  0.4498  2.4215 

Coefficients (truncated negbin with log link):
            Estimate Std. Error z value Pr(>|z|)    
(Intercept) 0.427224   0.941131   0.454    0.650    
width       0.037890   0.032751   1.157    0.247    
color       0.006985   0.091081   0.077    0.939    
Log(theta)  1.527243   0.352937   4.327 1.51e-05 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Theta = 4.6055
Number of iterations in BFGS optimization: 10 
Log-likelihood: -255.8 on 4 Df