an object for which a newresponse and a prodist method is available.
…
further parameters passed to methods.
newdata
optionally, a data frame in which to look for variables with which to predict. If omitted, the original observations are used.
type
character indicating whether quantile (default), PIT, Pearson, or raw response residuals should be computed.
nsim
integer. The number of randomly simulated residuals of type = “quantile” or “pit”. By default one simulation is returned.
prob
numeric. Instead of simulating the probabilities (between 0 and 1) for type = “quantile” or “pit”, a vector of probabilities can be specified, e.g., prob = 0.5 corresponding to mid-quantile residuals.
delta
numeric. The minimal difference to compute the range of proabilities corresponding to each observation according to get (randomized) “quantile” or “pit” residuals. For NULL, the minimal observed difference in the resonse divided by 5e-6 is used. Ignored for continuous distributions.
Details
The new generic function proresiduals comes with a powerful default method that is based on the following idea: newresponse and prodist can be used to extract the observed response and expected distribution for it, respectively. For all model classes that have methods for these two generic functions, proresiduals can compute a range of different types of residuals.
The simplest definition of residuals are the so-called “response” residuals which simply compute the difference between the observations and the expected means. The “pearson” residuals additionally standardize these residuals by the square root of the expected variance. Thus, these residuals are based only on the first and on the first two moments, respectively.
To assess the entire distribution and not just the first moments, there are also residuals based on the probability integral transform (PIT). For regression models with a continuous response distribution, “pit” residuals (see Warton 2007) are simply the expected cumulative distribution (CDF) evaluated at the observations (Dawid, 1984). For discrete distributions, a uniform random value is drawn from the range of probabilities between the CDF at the observation and the supremum of the CDF to the left of it. If the model fits well the PIT residuals should be uniformly distributed.
In order to obtain normally distributed residuals for well-fitting models (like often desired in linear regression models), “quantile” residuals, proposed by Dunn and Smyth (1996), additionally transform the PIT residuals by the standard normal quantile function.
As quantile residuals and PIT residuals are subject to randomness for discrete distributions (and also for mixed discrete-continuous distributions), it is sometimes useful to explore the extent of the random variation. This can be done either by obtaining multiple replications (via nsim) or by computing fixed quantiles of each probability interval such as prob = 0.5 (corresponding to mid-quantile residuals, see Feng et al. 2020). Another common setting is prob = c(0, 1) yielding the range of possible residuals.
Value
A vector or matrix of residuals. A matrix of residuals is returned if more than one replication of quantile or PIT residuals is computed, i.e., if either random > 1 or random = FALSE and length(prob) > 1.
References
Dawid AP (1984). “Present Position and Potential Developments: Some Personal Views: Statistical Theory: The Prequential Approach.” Journal of the Royal Statistical Society A, 147(2), 278–292. doi:10.2307/2981683.
Dunn KP, Smyth GK (1996). “Randomized Quantile Residuals.” Journal of Computational and Graphical Statistics, 5(3), 236–244. doi:10.2307/1390802
Feng C, Li L, Sadeghpour A (2020). “A Comparison of Residual Diagnosis Tools for Diagnosing Regression Models for Count Data” BMC Medical Research Methodology, 20(175), 1–21. doi:10.1186/s12874-020-01055-2
Warton DI, Thibaut L, Wang YA (2017) “The PIT-Trap – A ‘Model-Free’ Bootstrap Procedure for Inference about Regression Models with Discrete, Multivariate Responses”. PLOS ONE, 12(7), 1–18. doi:10.1371/journal.pone.0181790.
See Also
qnorm, qqrplot
Examples
library("topmodels")## Poisson GLM for FIFA 2018 datadata("FIFA2018", package ="distributions3")m<-glm(goals~difference, data =FIFA2018, family =poisson)## random quantile residuals (on original data)proresiduals(m)
## various flavors of residuals on small new datand<-data.frame(goals =c(1, 1, 1), difference =c(-1, 0, 1))## quantile residuals: random (1 sample), random (5 samples), mid-quantile (non-random)proresiduals(m, newdata =nd, type ="quantile")
1 2 3
0.08206473 0.16794331 -0.79489784
proresiduals(m, newdata =nd, type ="quantile", nsim =5)