Fit beta regression models for rates and proportions via maximum likelihood using a parametrization with mean (depending through a link function on the covariates) and precision parameter (called phi).
Usage
betareg(formula, data, subset, na.action, weights, offset,
link = c("logit", "probit", "cloglog", "cauchit", "log", "loglog"),
link.phi = NULL, type = c("ML", "BC", "BR"), dist = NULL, nu = NULL,
control = betareg.control(...), model = TRUE,
y = TRUE, x = FALSE, ...)
betareg.fit(x, y, z = NULL, weights = NULL, offset = NULL,
link = "logit", link.phi = "log", type = "ML", control = betareg.control(),
dist = NULL, nu = NULL)
Arguments
formula
symbolic description of the model, either of type y ~ x (mean submodel, constant precision) or y ~ x | z (submodels for both mean and precision); for details see below.
data, subset, na.action
arguments controlling formula processing via model.frame.
weights
optional numeric vector of case weights.
offset
optional numeric vector with an a priori known component to be included in the linear predictor for the mean. In betareg.fit, offset may also be a list of two offsets for the mean and precision equation, respectively.
link
character specification of the link function in the mean model (mu). Currently, “logit”, “probit”, “cloglog”, “cauchit”, “log”, “loglog” are supported. Alternatively, an object of class “link-glm” can be supplied.
link.phi
character specification of the link function in the precision model (phi). Currently, “identity”, “log”, “sqrt” are supported. The default is “log” unless formula is of type y ~ x where the default is “identity” (for backward compatibility). Alternatively, an object of class “link-glm” can be supplied.
type
character specification of the type of estimator. Currently, maximum likelihood (“ML”), ML with bias correction (“BC”), and ML with bias reduction (“BR”) are supported.
dist
character specification of the response distribution. Usually, this does not have to be set by the user because by default the classical “beta” distribution is used when all observations for the dependent variable are in (0, 1). In the presence of boundary observations (0 or 1, which cannot be accomodated by “beta”) the extended-support beta mixture distribution (“xbetax”) is used. Additionally, dist = “xbeta” can be used with fixed exceedence parameter nu, mostly for testing and debugging purposes.
nu
numeric. The fixed value of the expected exceedence parameter nu in case the extended-support beta mixture distribution is used. By default, nu does not need to be specified and is estimated if needed. So setting nu is mostly for profiling and debugging.
control
a list of control arguments specified via betareg.control.
model, y, x
logicals. If TRUE the corresponding components of the fit (model frame, response, model matrix) are returned. For betareg.fit, x should be a numeric regressor matrix and y should be the numeric response vector (with values in (0,1)).
z
numeric matrix. Regressor matrix for the precision model, defaulting to an intercept only.
…
arguments passed to betareg.control.
Details
Beta regression as suggested by Ferrari and Cribari-Neto (2004) and extended by Simas, Barreto-Souza, and Rocha (2010) is implemented in betareg. It is useful in situations where the dependent variable is continuous and restricted to the unit interval (0, 1), e.g., resulting from rates or proportions. It is modeled to be beta-distributed with parametrization using mean and precision parameter (called mu and phi, respectively). The mean mu is linked, as in generalized linear models (GLMs), to the explanatory variables through a link function and a linear predictor. Additionally, the precision parameter phi can be linked to another (potentially overlapping) set of regressors through a second link function, resulting in a model with variable dispersion (see Cribari-Neto and Zeileis 2010). Estimation is performed by default using maximum likelihood (ML) via optim with analytical gradients and starting values from an auxiliary linear regression of the transformed response. Subsequently, the optim result may be enhanced by an additional Fisher scoring iteration using analytical gradients and expected information. Alternative estimation methods are bias-corrected (BC) or bias-reduced (BR) maximum likelihood (see Grün, Kosmidis, and Zeileis 2012). For ML and BC the Fisher scoring is just a refinement to move the gradients even closer to zero and can be disabled by setting fsmaxit = 0 in the control arguments. For BR the Fisher scoring is needed to solve the bias-adjusted estimating equations.
In the beta regression as introduced by Ferrari and Cribari-Neto (2004), the mean of the response is linked to a linear predictor described by y ~ x1 + x2 using a link function while the precision parameter phi is assumed to be constant. Simas et al. (2009) suggest to extend this model by linking phi to an additional set of regressors (z1 + z2, say): In betareg this can be specified in a formula of type y ~ x1 + x2 | z1 + z2 where the regressors in the two parts can be overlapping. In the precision model (for phi), the link function link.phi is used. The default is a “log” link unless no precision model is specified. In the latter case (i.e., when the formula is of type y ~ x1 + x2), the “identity” link is used by default for backward compatibility.
Kosmidis and Zeileis (2024) introduce a generalization of the classic beta regression model with extended support [0, 1]. Specifically, the extended-support beta distribution (“xbeta”) leverages an underlying symmetric four-parameter beta distribution with exceedence parameter nu to obtain support [-nu, 1 + nu] that is subsequently censored to [0, 1] in order to obtain point masses at the boundary values 0 and 1. The extended-support beta mixture distribution (“xbetax”) is a continuous mixture of extended-support beta distributions where the exceedence parameter follows an exponential distribution with mean nu (rather than a fixed value of nu). The latter “xbetax” specification is used by default in case of boundary observations at 0 and/or 1. The “xbeta” specification with fixed nu is mostly for testing and debugging purposes.
A set of standard extractor functions for fitted model objects is available for objects of class “betareg”, including methods to the generic functions print, summary, plot, coef, vcov, logLik, residuals, predict, terms, model.frame, model.matrix, cooks.distance and hatvalues (see influence.measures), gleverage (new generic), estfun and bread (from the sandwich package), and coeftest (from the lmtest package).
See predict.betareg, residuals.betareg, plot.betareg, and summary.betareg for more details on all methods.
The main parameters of interest are the coefficients in the linear predictor of the mean model. The additional parameters in the precision model (phi) can either be treated as full model parameters (default) or as nuisance parameters. In the latter case the estimation does not change, only the reported information in output from print, summary, or coef (among others) will be different. See also betareg.control.
The implemented algorithms for bias correction/reduction follow Kosmidis and Firth (2010). Technical note: In case, either bias correction or reduction is requested, the second derivative of the inverse link function is required for link and link.phi. If the two links are specified by their names (as done by default in betareg), then the “link-glm” objects are enhanced automatically by the required additional d2mu.deta function. However, if a “link-glm” object is supplied directly by the user, it needs to have the d2mu.deta function or, for backward compatibility, dmu.deta.
The original version of the package was written by Alexandre B. Simas and Andrea V. Rocha (up to version 1.2). Starting from version 2.0-0 the code was rewritten by Achim Zeileis.
Value
betareg returns an object of class “betareg”, i.e., a list with components as follows. For classic beta regressions (dist = “beta”) several elements are lists with the names “mean” and “precision” for the information from the respective submodels. For extended-support beta regressions (dist = “xbetax” or “xbeta”), the corresponding names are “mu” and “phi” because they are not exactly the mean and precision due to the censoring in the response variable.
betareg.fit returns an unclassed list with components up to converged.
coefficients
a list with elements “mean” (or “mu”) and “precision” (or “phi”) containing the coefficients from the respective submodels and for extended-support beta regressions an additional element “nu”,
residuals
a vector of raw residuals (observed - fitted),
fitted.values
a vector of fitted means,
optim
output from the optim call for maximizing the log-likelihood(s),
method
the method argument passed to the optim call,
control
the control arguments passed to the optim call,
start
the starting values for the parameters passed to the optim call,
weights
the weights used (if any),
offset
a list of offset vectors used (if any),
n
number of observations,
nobs
number of observations with non-zero weights,
df.null
residual degrees of freedom in the null model (constant mean and dispersion), i.e., n - 2,
df.residual
residual degrees of freedom in the fitted model,
phi
logical indicating whether the precision (phi) coefficients will be treated as full model parameters or nuisance parameters in subsequent calls to print, summary, coef etc.,
loglik
log-likelihood of the fitted model,
vcov
covariance matrix of all parameters in the model,
pseudo.r.squared
pseudo R-squared value (squared correlation of linear predictor and link-transformed response),
link
a list with elements “mean” (or “mu”) and “precision” (or “phi”) containing the link objects for the respective submodels,
converged
logical indicating successful convergence of optim,
call
the original function call,
formula
the original formula,
terms
a list with elements “mean” (or “mu”), “precision” (or “phi”) and “full” containing the terms objects for the respective models,
levels
a list with elements “mean” (or “mu”), “precision” (or “phi”) and “full” containing the levels of the categorical regressors,
contrasts
a list with elements “mean” (or “mu”) and “precision” (or “phi”) containing the contrasts corresponding to levels from the respective models,
model
the full model frame (if model = TRUE),
y
the response proportion vector (if y = TRUE),
x
a list with elements “mean” (or “mu”) and “precision” (or “phi”) containing the model matrices from the respective models (if x = TRUE).
References
Cribari-Neto F, Zeileis A (2010). Beta Regression in R. Journal of Statistical Software, 34(2), 1–24. doi:10.18637/jss.v034.i02
Ferrari SLP, Cribari-Neto F (2004). Beta Regression for Modeling Rates and Proportions. Journal of Applied Statistics, 31(7), 799–815.
Grün B, Kosmidis I, Zeileis A (2012). Extended Beta Regression in R: Shaken, Stirred, Mixed, and Partitioned. Journal of Statistical Software, 48(11), 1–25. doi:10.18637/jss.v048.i11
Kosmidis I, Firth D (2010). A Generic Algorithm for Reducing Bias in Parametric Estimation. Electronic Journal of Statistics, 4, 1097–1112.
Kosmidis I, Zeileis A (2024). Extended-Support Beta Regression for [0, 1] Responses. 2409.07233, arXiv.org E-Print Archive. doi:10.48550/arXiv.2409.07233
Simas AB, Barreto-Souza W, Rocha AV (2010). Improved Estimators for a General Class of Beta Regression Models. Computational Statistics & Data Analysis, 54(2), 348–366.
See Also
summary.betareg, predict.betareg, residuals.betareg, Formula
Examples
library("betareg")options(digits =4)## Section 4 from Ferrari and Cribari-Neto (2004)data("GasolineYield", package ="betareg")data("FoodExpenditure", package ="betareg")## Table 1gy<-betareg(yield~batch+temp, data =GasolineYield)summary(gy)
Call:
betareg(formula = yield ~ batch + temp, data = GasolineYield)
Quantile residuals:
Min 1Q Median 3Q Max
-2.140 -0.570 0.120 0.704 1.751
Coefficients (mean model with logit link):
Estimate Std. Error z value Pr(>|z|)
(Intercept) -6.159571 0.182325 -33.78 < 2e-16 ***
batch1 1.727729 0.101229 17.07 < 2e-16 ***
batch2 1.322597 0.117902 11.22 < 2e-16 ***
batch3 1.572310 0.116105 13.54 < 2e-16 ***
batch4 1.059714 0.102360 10.35 < 2e-16 ***
batch5 1.133752 0.103523 10.95 < 2e-16 ***
batch6 1.040162 0.106036 9.81 < 2e-16 ***
batch7 0.543692 0.109127 4.98 6.3e-07 ***
batch8 0.495901 0.108926 4.55 5.3e-06 ***
batch9 0.385793 0.118593 3.25 0.0011 **
temp 0.010967 0.000413 26.58 < 2e-16 ***
Phi coefficients (precision model with identity link):
Estimate Std. Error z value Pr(>|z|)
(phi) 440 110 4 6.3e-05 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Type of estimator: ML (maximum likelihood)
Log-likelihood: 84.8 on 12 Df
Pseudo R-squared: 0.962
Number of iterations: 51 (BFGS) + 3 (Fisher scoring)
## Table 2fe_lin<-lm(I(food/income)~income+persons, data =FoodExpenditure)library("lmtest")bptest(fe_lin)
studentized Breusch-Pagan test
data: fe_lin
BP = 5.9, df = 2, p-value = 0.05
fe_beta<-betareg(I(food/income)~income+persons, data =FoodExpenditure)summary(fe_beta)
Call:
betareg(formula = I(food/income) ~ income + persons, data = FoodExpenditure)
Quantile residuals:
Min 1Q Median 3Q Max
-2.533 -0.460 0.170 0.642 1.773
Coefficients (mean model with logit link):
Estimate Std. Error z value Pr(>|z|)
(Intercept) -0.62255 0.22385 -2.78 0.0054 **
income -0.01230 0.00304 -4.05 5.1e-05 ***
persons 0.11846 0.03534 3.35 0.0008 ***
Phi coefficients (precision model with identity link):
Estimate Std. Error z value Pr(>|z|)
(phi) 35.61 8.08 4.41 1e-05 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Type of estimator: ML (maximum likelihood)
Log-likelihood: 45.3 on 4 Df
Pseudo R-squared: 0.388
Number of iterations: 28 (BFGS) + 4 (Fisher scoring)
## nested model comparisons via Wald and LR testsfe_beta2<-betareg(I(food/income)~income, data =FoodExpenditure)lrtest(fe_beta, fe_beta2)
Likelihood ratio test
Model 1: I(food/income) ~ income + persons
Model 2: I(food/income) ~ income
#Df LogLik Df Chisq Pr(>Chisq)
1 4 45.3
2 3 40.5 -1 9.65 0.0019 **
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Wald test
Model 1: I(food/income) ~ income + persons
Model 2: I(food/income) ~ income
Res.Df Df Chisq Pr(>Chisq)
1 34
2 35 -1 11.2 8e-04 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## Section 3 from online supplements to Simas et al. (2010)## mean model as in gy above## precision model with regressor tempgy2<-betareg(yield~batch+temp|temp, data =GasolineYield)## MLE column in Table 19summary(gy2)
Call:
betareg(formula = yield ~ batch + temp | temp, data = GasolineYield)
Quantile residuals:
Min 1Q Median 3Q Max
-2.104 -0.585 -0.143 0.690 2.520
Coefficients (mean model with logit link):
Estimate Std. Error z value Pr(>|z|)
(Intercept) -5.923236 0.183526 -32.27 < 2e-16 ***
batch1 1.601988 0.063856 25.09 < 2e-16 ***
batch2 1.297266 0.099100 13.09 < 2e-16 ***
batch3 1.565338 0.099739 15.69 < 2e-16 ***
batch4 1.030072 0.063288 16.28 < 2e-16 ***
batch5 1.154163 0.065643 17.58 < 2e-16 ***
batch6 1.019445 0.066351 15.36 < 2e-16 ***
batch7 0.622259 0.065632 9.48 < 2e-16 ***
batch8 0.564583 0.060185 9.38 < 2e-16 ***
batch9 0.359439 0.067141 5.35 8.6e-08 ***
temp 0.010359 0.000436 23.75 < 2e-16 ***
Phi coefficients (precision model with log link):
Estimate Std. Error z value Pr(>|z|)
(Intercept) 1.36409 1.22578 1.11 0.27
temp 0.01457 0.00362 4.03 5.7e-05 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Type of estimator: ML (maximum likelihood)
Log-likelihood: 87 on 13 Df
Pseudo R-squared: 0.952
Number of iterations: 33 (BFGS) + 28 (Fisher scoring)