countreg

Negative Binomial Count Data Regression

Description

Fit negative binomial regression models for count data via maximum likelihood

Usage

nbreg(formula, data, subset, na.action, weights, offset, theta = NULL,
  dist = "NB2", link = "log", link.theta = "log", control = nbreg.control(...),
  model = TRUE, y = TRUE, x = FALSE, z = FALSE, hessA = TRUE, ...)

Arguments

`formula`	symbolic description of the model, see details.
`data`, `subset`, `na.action`	arguments controlling formula processing via `model.frame`.
`weights`	optional numeric vector of weights.
`offset`	optional numeric vector with an a priori known component to be included in the linear predictor. See below for more information on offsets.
`theta`	numeric. Optional. If specified, then the dispersion parameter is not estimated.
`dist`	character specification of the NB type. Either `“NB2”` or `“NB1”`. Lowercase versions `“nb2”` and `“nb1”` are equivalent.
`link`	character specification of the link function for the mean. Currently, only `“log”` is supported.
`link.theta`	character specification of the link function for the dispersion parameter. Currently, only `“log”` is supported.
`control`	a list of control arguments specified via `nbreg.control`.
`model`, `y`, `x`, `z`	logicals. If `TRUE` the corresponding components of the fit (model frame, response, model matrix) are returned.
`hessA`	logical. If `TRUE`, then the analytical Hessian is used to compute the covariance matrix of the estimator.
`…`	currently not used.

Details

The Negative Binomial Distribution is often used to model count data with overdispersion. Cameron and Trivedi (2013) offer two parametrization, negative binomial type 2 (NB2) and type 1 (NB1). NB2 is parametrized as follows

\(f(y | \mu, \theta) = \dfrac{\Gamma(\theta + y)}{\Gamma(\theta) y!} \left(\dfrac{\theta}{\theta + \mu}\right)^\theta \left(\dfrac{\mu}{\theta + \mu}\right)^y.\)

For NB1 replace \(\theta\) with \(\mu\theta\) on the RHS.

This function further allows us to model the dispersion parameter with covariates via a two-part formula. If a formula of type y ~ x1 + x2 is supplied, then the regressors are employed in the mean and the dispersion parameter is estimated as a constant, i.e. a standard NB2 or NB1 is estimated. This is equivalent to y ~ x1 + x2 | 1. If a formula of type y ~ x1 + x2 + x3 | z1 + z2 is given, then the mean \(mu\) is modeled using x1 + x2 and the dispersion parameter \(\theta\) with z1 + z2. If dist = “NB2”, then the function estimates the NBH model.

Offsets can be specified in both the mean and dispersion parameter \(\theta\): y ~ x1 + x2 + offset(x3) | z1 + offset(z2), where x3 is used as an offset (i.e., with coefficient fixed to 1) in the mean \(mu\) and z2 analogously in \(\theta\). By the rule stated above y ~ x1 + offset(x2) is equivalent to y ~ x1 + offset(x2) | 1. Instead of using the offset() wrapper within the formula, the offset argument can also be employed which sets an offset only for \(mu\). Thus, formula = y ~ x1 and offset = x2 is equivalent to formula = y ~ x1 + offset(x2) | 1.

All parameters are estimated by maximum likelihood using optim, with control options set in nbreg.control. Starting values can be supplied or are estimated by a Poisson regression in glm.fit (the default, starting values of coefficients in \(\theta\) are set to zero to ensure compatibility with NB1). Standard errors are derived analytically or numerically using the Hessian matrix returned by
optim. See nbreg.control for details.

The returned fitted model object is of class “nbreg” and is similar to fitted “glm” objects.

A set of standard extractor functions for fitted model objects is available for objects of class “nbreg”, including methods to the generic functions print, summary, coef, vcov, logLik, residuals, predict, fitted, terms, model.matrix. See predict.nbreg for more details on all methods.

Value

An object of class “nbreg”, i.e., a list with components including

`coefficients`	a vector containing the coefficients from the mean,
`coefficients.theta`	a vector containing the coefficients from the dispersion parameter theta,
`residuals`	a vector of raw residuals (observed - fitted),
`fitted.values`	a vector of fitted means,
`optim`	a list with the output from the `optim` call for maximizing the log-likelihood,
`control`	the control arguments passed to the `optim` call,
`start`	the starting values for the parameters passed to the `optim` call,
`weights`	the case weights used,
`offset`	a list with elements `“mu”` and `“theta”` containing the offset vectors (if any) from the respective parameters,
`n`	number of observations (with weights > 0),
`df.null`	residual degrees of freedom for the null model,
`df.residual`	residual degrees of freedom for fitted model,
`terms`	a list with elements `“mu”`, `“theta”` and `“full”` containing the terms objects for the respective parameters,
`SE.logtheta`	standard error for \(\log(\theta)\),
`loglik`	log-likelihood of the fitted model,
`vcov`	covariance matrix of all coefficients in the model (derived from the analytical Hessian (`hessA = TRUE`) or from the Hessian of the `optim` output (`hessA = FALSE`)),
`dist`	character string describing the type of NB distribution used,
`link`	character string describing the link of the mean,
`link.theta`	character string describing the link of the dispersion parameter theta,
`converged`	logical indicating successful convergence of `optim`,
`call`	the original function call,
`formula`	the original formula,
`levels`	levels of the categorical regressors,
`contrasts`	a list with elements `“mu”` and `“theta”` containing the contrasts corresponding to `levels` from the respective parts,
`model`	the full model frame (if `model = TRUE`),
`y`	the response count vector (if `y = TRUE`),
`x`	the model matrix for the mean (if `x = TRUE`),
`z`	the model matrix for the mean (if `z = TRUE`),

References

Cameron AC, Trivedi PK (2013). Regression Analysis of Count Data, 2nd ed. New York: Cambridge University Press.

Examples

library("countreg")

data("CrabSatellites", package = "countreg")

## NB2
fm_nb2 <- nbreg(satellites ~ width + color, data = CrabSatellites)

## NB1
fm_nb1 <- nbreg(satellites ~ width + color, data = CrabSatellites, dist = "NB1")

## NBH
fm_nbh <- nbreg(satellites ~ width + color | weight, data = CrabSatellites)

## NB1 with variable theta
fm_nb1h <- nbreg(satellites ~ width + color | weight, data = CrabSatellites,
                dist = "NB1")

## Example not run:
## data
# data("GSOEP", package = "countreg")
# gsoep <- subset(GSOEP, year == "1984")

## NB2
# fm_nb2 <- nbreg(docvis ~ educ + public + addon,
#                 data = gsoep)
                
## NB1
# fm_nb1 <- nbreg(docvis ~ educ + public + addon,
#                 data = gsoep, dist = "NB1")                

## NBH
# fm_nbh <- nbreg(docvis ~ educ + public + addon | married + public,
#                 data = gsoep)
                
## NB1 with variable theta
# fm_nb1h <- nbreg(docvis ~ educ + public + addon | married + public,
#                 data = gsoep, dist = "NB1")