Residuals for Probabilistic Regression Models

Generic function and default method for (randomized) quantile residuals, PIT, Pearson, and raw response residuals based on distributions3 support.

proresiduals(object, ...)

# S3 method for default
proresiduals(
  object,
  newdata = NULL,
  type = c("quantile", "pit", "pearson", "response"),
  random = TRUE,
  prob = NULL,
  delta = NULL,
  ...
)

Arguments

object: an object for which a newresponse and a prodist method is available.
...: further parameters passed to methods.
newdata: optionally, a data frame in which to look for variables with which to predict. If omitted, the original observations are used.
type: character indicating whether quantile (default), PIT, Pearson, or raw response residuals should be computed.
random: logical or numeric. Should random residuals be computed for type = "quantile" and "pit"? The default is TRUE and if set to FALSE, fixed quantiles given at the probabilities in prob are used (defaulting to mid-quantiles). If random > 1, then multiple random replications of quantile or PIT residuals are computed. For other residual types random has no effect.
prob: numeric. Fixed probabilities for the quantile or PIT residuals when random = FALSE.
delta: numeric. The minimal difference to compute the range of proabilities corresponding to each observation according to get (randomized) "quantile" or "pit" residuals. For NULL, the minimal observed difference in the resonse divided by 5e-6 is used. Ignored for continuous distributions.

Value

A vector or matrix of residuals. A matrix of residuals is returned if more than one replication of quantile or PIT residuals is computed, i.e., if either random > 1 or random = FALSE and length(prob) > 1.

Details

The new generic function proresiduals comes with a powerful default method that is based on the following idea: newresponse and prodist can be used to extract the observed response and expected distribution for it, respectively. For all model classes that have methods for these two generic functions, proresiduals can compute a range of different types of residuals.

The simplest definition of residuals are the so-called "response" residuals which simply compute the difference between the observations and the expected means. The "pearson" residuals additionally standardize these residuals by the square root of the expected variance. Thus, these residuals are based only on the first and on the first two moments, respectively.

To assess the entire distribution and not just the first moments, there are also residuals based on the probability integral transform (PIT). For regression models with a continuous response distribution, "pit" residuals (see Warton 2007) are simply the expected cumulative distribution (CDF) evaluated at the observations (Dawid, 1984). For discrete distributions, a uniform random value is drawn from the range of probabilities between the CDF at the observation and the supremum of the CDF to the left of it. If the model fits well the PIT residuals should be uniformly distributed.

In order to obtain normally distributed residuals for well-fitting models (like often desired in linear regression models), "quantile" residuals, proposed by Dunn and Smyth (1996), additionally transform the PIT residuals by the standard normal quantile function.

As quantile residuals and PIT residuals are subject to randomness for discrete distributions (and also for mixed discrete-continuous distributions), it is sometimes useful to explore the extent of the random variation by obtaining multiple replications. In proresiduals this can be achieved by setting random > 1.

Alternatively, the randomness can be suppressed via random = FALSE and then only one (or more) fixed quantile(s) of each probability interval is returned. The default is prob = 0.5 which corresponds to mid-quantile residuals (see Feng et al. 2020). Another common setting is prob = c(0, 1) which yields the range of possible residuals.

References

Dawid AP (1984). “Present Position and Potential Developments: Some Personal Views: Statistical Theory: The Prequential Approach.” Journal of the Royal Statistical Society A, 147(2), 278--292. doi:10.2307/2981683 .

Dunn KP, Smyth GK (1996). “Randomized Quantile Residuals.” Journal of Computational and Graphical Statistics, 5(3), 236--244. doi:10.2307/1390802

Feng C, Li L, Sadeghpour A (2020). “A Comparison of Residual Diagnosis Tools for Diagnosing Regression Models for Count Data” BMC Medical Research Methodology, 20(175), 1--21. doi:10.1186/s12874-020-01055-2

Warton DI, Thibaut L, Wang YA (2017) “The PIT-Trap -- A ‘Model-Free’ Bootstrap Procedure for Inference about Regression Models with Discrete, Multivariate Responses”. PLOS ONE, 12(7), 1--18. doi:10.1371/journal.pone.0181790 .

Examples