Negative Binomial Count Data Regression

Description

Fit negative binomial regression models for count data via maximum likelihood

Usage

nbreg(formula, data, subset, na.action, weights, offset, theta = NULL,
  dist = "NB2", link = "log", link.theta = "log", control = nbreg.control(...),
  model = TRUE, y = TRUE, x = FALSE, z = FALSE, hessA = TRUE, ...)

Arguments

formula symbolic description of the model, see details.
data, subset, na.action arguments controlling formula processing via model.frame.
weights optional numeric vector of weights.
offset optional numeric vector with an a priori known component to be included in the linear predictor. See below for more information on offsets.
theta numeric. Optional. If specified, then the dispersion parameter is not estimated.
dist character specification of the NB type. Either “NB2” or “NB1”. Lowercase versions “nb2” and “nb1” are equivalent.
link character specification of the link function for the mean. Currently, only “log” is supported.
link.theta character specification of the link function for the dispersion parameter. Currently, only “log” is supported.
control a list of control arguments specified via nbreg.control.
model, y, x, z logicals. If TRUE the corresponding components of the fit (model frame, response, model matrix) are returned.
hessA logical. If TRUE, then the analytical Hessian is used to compute the covariance matrix of the estimator.
currently not used.

Details

The Negative Binomial Distribution is often used to model count data with overdispersion. Cameron and Trivedi (2013) offer two parametrization, negative binomial type 2 (NB2) and type 1 (NB1). NB2 is parametrized as follows

\(f(y | \mu, \theta) = \dfrac{\Gamma(\theta + y)}{\Gamma(\theta) y!} \left(\dfrac{\theta}{\theta + \mu}\right)^\theta \left(\dfrac{\mu}{\theta + \mu}\right)^y.\)

For NB1 replace \(\theta\) with \(\mu\theta\) on the RHS.

This function further allows us to model the dispersion parameter with covariates via a two-part formula. If a formula of type y ~ x1 + x2 is supplied, then the regressors are employed in the mean and the dispersion parameter is estimated as a constant, i.e. a standard NB2 or NB1 is estimated. This is equivalent to y ~ x1 + x2 | 1. If a formula of type y ~ x1 + x2 + x3 | z1 + z2 is given, then the mean \(mu\) is modeled using x1 + x2 and the dispersion parameter \(\theta\) with z1 + z2. If dist = “NB2”, then the function estimates the NBH model.

Offsets can be specified in both the mean and dispersion parameter \(\theta\): y ~ x1 + x2 + offset(x3) | z1 + offset(z2), where x3 is used as an offset (i.e., with coefficient fixed to 1) in the mean \(mu\) and z2 analogously in \(\theta\). By the rule stated above y ~ x1 + offset(x2) is equivalent to y ~ x1 + offset(x2) | 1. Instead of using the offset() wrapper within the formula, the offset argument can also be employed which sets an offset only for \(mu\). Thus, formula = y ~ x1 and offset = x2 is equivalent to formula = y ~ x1 + offset(x2) | 1.

All parameters are estimated by maximum likelihood using optim, with control options set in nbreg.control. Starting values can be supplied or are estimated by a Poisson regression in glm.fit (the default, starting values of coefficients in \(\theta\) are set to zero to ensure compatibility with NB1). Standard errors are derived analytically or numerically using the Hessian matrix returned by
optim. See nbreg.control for details.

The returned fitted model object is of class “nbreg” and is similar to fitted “glm” objects.

A set of standard extractor functions for fitted model objects is available for objects of class “nbreg”, including methods to the generic functions print, summary, coef, vcov, logLik, residuals, predict, fitted, terms, model.matrix. See predict.nbreg for more details on all methods.

Value

An object of class “nbreg”, i.e., a list with components including

coefficients a vector containing the coefficients from the mean,
coefficients.theta a vector containing the coefficients from the dispersion parameter theta,
residuals a vector of raw residuals (observed - fitted),
fitted.values a vector of fitted means,
optim a list with the output from the optim call for maximizing the log-likelihood,
control the control arguments passed to the optim call,
start the starting values for the parameters passed to the optim call,
weights the case weights used,
offset a list with elements “mu” and “theta” containing the offset vectors (if any) from the respective parameters,
n number of observations (with weights > 0),
df.null residual degrees of freedom for the null model,
df.residual residual degrees of freedom for fitted model,
terms a list with elements “mu”, “theta” and “full” containing the terms objects for the respective parameters,
SE.logtheta standard error for \(\log(\theta)\),
loglik log-likelihood of the fitted model,
vcov covariance matrix of all coefficients in the model (derived from the analytical Hessian (hessA = TRUE) or from the Hessian of the optim output (hessA = FALSE)),
dist character string describing the type of NB distribution used,
link character string describing the link of the mean,
link.theta character string describing the link of the dispersion parameter theta,
converged logical indicating successful convergence of optim,
call the original function call,
formula the original formula,
levels levels of the categorical regressors,
contrasts a list with elements “mu” and “theta” containing the contrasts corresponding to levels from the respective parts,
model the full model frame (if model = TRUE),
y the response count vector (if y = TRUE),
x the model matrix for the mean (if x = TRUE),
z the model matrix for the mean (if z = TRUE),

References

Cameron AC, Trivedi PK (2013). Regression Analysis of Count Data, 2nd ed. New York: Cambridge University Press.

See Also

nbreg.control, glm, glm.fit, glm.nb,

Examples

library("countreg")

data("CrabSatellites", package = "countreg")

## NB2
fm_nb2 <- nbreg(satellites ~ width + color, data = CrabSatellites)

## NB1
fm_nb1 <- nbreg(satellites ~ width + color, data = CrabSatellites, dist = "NB1")

## NBH
fm_nbh <- nbreg(satellites ~ width + color | weight, data = CrabSatellites)

## NB1 with variable theta
fm_nb1h <- nbreg(satellites ~ width + color | weight, data = CrabSatellites,
                dist = "NB1")

## Example not run:
## data
# data("GSOEP", package = "countreg")
# gsoep <- subset(GSOEP, year == "1984")

## NB2
# fm_nb2 <- nbreg(docvis ~ educ + public + addon,
#                 data = gsoep)
                
## NB1
# fm_nb1 <- nbreg(docvis ~ educ + public + addon,
#                 data = gsoep, dist = "NB1")                

## NBH
# fm_nbh <- nbreg(docvis ~ educ + public + addon | married + public,
#                 data = gsoep)
                
## NB1 with variable theta
# fm_nb1h <- nbreg(docvis ~ educ + public + addon | married + public,
#                 data = gsoep, dist = "NB1")