Generic function and default method for (randomized) quantile residuals, PIT, Pearson, and raw response residuals based on distributions3 support.

proresiduals(object, ...)

# S3 method for default
proresiduals(
  object,
  newdata = NULL,
  type = c("quantile", "pit", "pearson", "response"),
  random = TRUE,
  prob = NULL,
  delta = NULL,
  ...
)

Arguments

object

an object for which a newresponse and a prodist method is available.

...

further parameters passed to methods.

newdata

optionally, a data frame in which to look for variables with which to predict. If omitted, the original observations are used.

type

character indicating whether quantile (default), PIT, Pearson, or raw response residuals should be computed.

random

logical or numeric. Should random residuals be computed for type = "quantile" and "pit"? The default is TRUE and if set to FALSE, fixed quantiles given at the probabilities in prob are used (defaulting to mid-quantiles). If random > 1, then multiple random replications of quantile or PIT residuals are computed. For other residual types random has no effect.

prob

numeric. Fixed probabilities for the quantile or PIT residuals when random = FALSE.

delta

numeric. The minimal difference to compute the range of proabilities corresponding to each observation according to get (randomized) "quantile" or "pit" residuals. For NULL, the minimal observed difference in the resonse divided by 5e-6 is used. Ignored for continuous distributions.

Value

A vector or matrix of residuals. A matrix of residuals is returned if more than one replication of quantile or PIT residuals is computed, i.e., if either random > 1 or random = FALSE and length(prob) > 1.

Details

The new generic function proresiduals comes with a powerful default method that is based on the following idea: newresponse and prodist can be used to extract the observed response and expected distribution for it, respectively. For all model classes that have methods for these two generic functions, proresiduals can compute a range of different types of residuals.

The simplest definition of residuals are the so-called "response" residuals which simply compute the difference between the observations and the expected means. The "pearson" residuals additionally standardize these residuals by the square root of the expected variance. Thus, these residuals are based only on the first and on the first two moments, respectively.

To assess the entire distribution and not just the first moments, there are also residuals based on the probability integral transform (PIT). For regression models with a continuous response distribution, "pit" residuals (see Warton 2007) are simply the expected cumulative distribution (CDF) evaluated at the observations (Dawid, 1984). For discrete distributions, a uniform random value is drawn from the range of probabilities between the CDF at the observation and the supremum of the CDF to the left of it. If the model fits well the PIT residuals should be uniformly distributed.

In order to obtain normally distributed residuals for well-fitting models (like often desired in linear regression models), "quantile" residuals, proposed by Dunn and Smyth (1996), additionally transform the PIT residuals by the standard normal quantile function.

As quantile residuals and PIT residuals are subject to randomness for discrete distributions (and also for mixed discrete-continuous distributions), it is sometimes useful to explore the extent of the random variation by obtaining multiple replications. In proresiduals this can be achieved by setting random > 1.

Alternatively, the randomness can be suppressed via random = FALSE and then only one (or more) fixed quantile(s) of each probability interval is returned. The default is prob = 0.5 which corresponds to mid-quantile residuals (see Feng et al. 2020). Another common setting is prob = c(0, 1) which yields the range of possible residuals.

References

Dawid AP (1984). “Present Position and Potential Developments: Some Personal Views: Statistical Theory: The Prequential Approach.” Journal of the Royal Statistical Society A, 147(2), 278--292. doi:10.2307/2981683 .

Dunn KP, Smyth GK (1996). “Randomized Quantile Residuals.” Journal of Computational and Graphical Statistics, 5(3), 236--244. doi:10.2307/1390802

Feng C, Li L, Sadeghpour A (2020). “A Comparison of Residual Diagnosis Tools for Diagnosing Regression Models for Count Data” BMC Medical Research Methodology, 20(175), 1--21. doi:10.1186/s12874-020-01055-2

Warton DI, Thibaut L, Wang YA (2017) “The PIT-Trap -- A ‘Model-Free’ Bootstrap Procedure for Inference about Regression Models with Discrete, Multivariate Responses”. PLOS ONE, 12(7), 1--18. doi:10.1371/journal.pone.0181790 .

See also

Examples

## Poisson GLM for FIFA 2018 data
data("FIFA2018", package = "distributions3")
m <- glm(goals ~ difference, data = FIFA2018, family = poisson)

## random quantile residuals
proresiduals(m)
#>            1            2            3            4            5            6 
#>  2.121391343 -0.870829887 -0.409313713 -0.398996087  1.279323226  0.423852923 
#>            7            8            9           10           11           12 
#> -0.721350510 -1.003574008  1.102676448 -0.546669720  0.962804005 -0.316228552 
#>           13           14           15           16           17           18 
#> -0.739981117  0.273003421  2.063776099  1.428015088 -0.453951606 -1.610287017 
#>           19           20           21           22           23           24 
#> -1.411194908 -0.934977924 -0.058239246 -0.821002708  1.627328789 -0.046773847 
#>           25           26           27           28           29           30 
#>  0.343772467  0.481905086 -0.869549229 -0.564616712  0.139767620  0.509241780 
#>           31           32           33           34           35           36 
#> -0.485255437 -1.375145617 -0.454371204 -1.949937093 -1.577362085  0.644553913 
#>           37           38           39           40           41           42 
#> -0.542332525  0.575688850  0.773677653 -2.234759032 -0.926803673  1.442863660 
#>           43           44           45           46           47           48 
#>  0.985674715 -1.264427978  0.466815688  0.303144930 -0.127912344  0.446301302 
#>           49           50           51           52           53           54 
#> -0.780019834  0.090552169 -0.722048020  0.246295435 -0.037847805 -0.897378928 
#>           55           56           57           58           59           60 
#> -0.392254615  0.631771062 -1.222177367  0.090525677  0.377167382  1.094367969 
#>           61           62           63           64           65           66 
#> -2.599488279  0.381903891 -0.054256968 -1.246356958 -0.137317322  0.779801188 
#>           67           68           69           70           71           72 
#>  0.315399157  0.042437121  1.683986030 -3.062763544 -1.490724714  1.764524040 
#>           73           74           75           76           77           78 
#>  0.556259169 -0.254498696  0.559267677 -0.074324329  1.630638756  1.399866470 
#>           79           80           81           82           83           84 
#>  2.211817402  0.331909157 -1.853638892 -0.373712781 -0.022654464  0.447119239 
#>           85           86           87           88           89           90 
#>  0.021875629  1.112887630  0.076203550  0.613008422  0.466150146  0.355965917 
#>           91           92           93           94           95           96 
#> -1.331674057  1.372869120 -0.653998253  0.122785883 -1.212792623 -0.203224183 
#>           97           98           99          100          101          102 
#>  1.808338824  1.411707437  0.812780327  0.146983853 -0.060088635  0.593423621 
#>          103          104          105          106          107          108 
#> -0.495442467  0.144175587  0.447625347 -0.598954092  0.926853890  1.357653473 
#>          109          110          111          112          113          114 
#> -0.268520589 -1.051176079 -0.331889615 -0.312948506 -0.792573244  0.316483384 
#>          115          116          117          118          119          120 
#>  0.125058985  0.863613275 -0.548998622  0.722704203  0.221340782  0.174326845 
#>          121          122          123          124          125          126 
#> -0.120542079 -1.018568723 -0.344973548 -0.003119137  0.351795087 -0.628584783 
#>          127          128 
#>  1.810441882  1.150576905 

## Pearson residuals
proresiduals(m, type = "pearson")
#>            1            2            3            4            5            6 
#>  2.430654322 -0.930334590 -1.014724759 -0.398804262  1.305927102 -0.064013188 
#>            7            8            9           10           11           12 
#> -0.613884092 -0.914301665  1.519090649 -1.093055385  0.993556817 -0.424264684 
#>           13           14           15           16           17           18 
#> -1.113056199 -0.211616188  2.038308437  1.153169320 -0.546750414 -0.944255416 
#>           19           20           21           22           23           24 
#> -0.840278169 -0.792913544  0.116280482 -0.548303440  1.537474481 -0.111528206 
#>           25           26           27           28           29           30 
#> -0.099738912  0.339830526 -1.079060716 -0.274111521 -0.360546190 -0.066833526 
#>           31           32           33           34           35           36 
#> -0.691375999 -0.881237871 -0.908320460 -1.361897916 -1.065751519  0.562346952 
#>           37           38           39           40           41           42 
#> -0.625851688  0.190909297  0.375971301 -0.998686051 -1.222716781  1.953550708 
#>           43           44           45           46           47           48 
#>  0.688071124 -1.113023052  0.192354231  0.107004865  0.001191279  0.378023421 
#>           49           50           51           52           53           54 
#> -1.048037832 -0.333124464 -0.702612840  0.264228995 -0.223024138 -0.808461816 
#>           55           56           57           58           59           60 
#> -0.170029883  0.623708821 -0.857974497 -0.054672753  0.452546135  0.923900344 
#>           61           62           63           64           65           66 
#> -1.398279331  0.245655957 -0.333908002 -1.047632914 -0.051316840  0.453072234 
#>           67           68           69           70           71           72 
#> -0.027001956  0.287781642  1.634605655 -1.515806191 -1.135687571  1.664963346 
#>           73           74           75           76           77           78 
#>  0.627910298 -0.855301681  0.204392856  0.090052537  2.096199105  1.427808319 
#>           79           80           81           82           83           84 
#>  2.910552914  0.239342145 -1.071915077 -0.287530123 -0.178220460  0.635530844 
#>           85           86           87           88           89           90 
#> -0.438382284  1.014266304 -0.339381957  0.869416029  0.747455937  0.624804291 
#>           91           92           93           94           95           96 
#> -1.082035785  1.480840871 -1.023025797 -0.382201924 -1.016451750 -0.395336915 
#>           97           98           99          100          101          102 
#>  2.319462548  1.720582549  0.734690903 -0.246750527 -0.518360463  0.087311027 
#>          103          104          105          106          107          108 
#> -0.359633135 -0.067732640  0.018629138 -0.880499408  0.854954441  1.281307943 
#>          109          110          111          112          113          114 
#> -0.180142894 -1.130625614 -0.111669780 -0.315120066 -0.972503579  0.300292121 
#>          115          116          117          118          119          120 
#> -0.390918669  0.944711115 -0.976915794  0.313170650 -0.169708261 -0.256636287 
#>          121          122          123          124          125          126 
#> -0.332373025 -1.048426333 -0.144779744 -0.281712175  0.578986159 -1.071915077 
#>          127          128 
#>  1.891273576  1.071258286 

## various flavors of residuals on small new data
nd <- data.frame(goals = c(1, 1, 1), difference = c(-1, 0, 1))

## random quantile residuals
set.seed(0)
proresiduals(m, newdata = nd, type = "quantile")
#>          1          2          3 
#>  0.7223565 -0.2908823 -0.6393725 
set.seed(0)
proresiduals(m, newdata = nd, type = "quantile", random = 5)
#>             r_1        r_2        r_3        r_4         r_5
#> [1,]  0.7223565  0.3800686  0.7243461  0.4353980  0.01250164
#> [2,] -0.2908823  0.2958457  0.3303159 -0.4889856  0.09265594
#> [3,] -0.6393725 -0.7988924 -0.4002845 -0.7946336 -0.62879276

## underlying probability integral transform (PIT) without transformation to normal
set.seed(0)
proresiduals(m, newdata = nd, type = "pit", random = 5)
#>            r_1       r_2       r_3       r_4       r_5
#> [1,] 0.7649623 0.6480528 0.7655733 0.6683632 0.5049873
#> [2,] 0.3855707 0.6163260 0.6294193 0.3124259 0.5369116
#> [3,] 0.2612903 0.2121764 0.3444735 0.2134133 0.2647424

## raw response residuals (observation - expected mean)
proresiduals(m, newdata = nd, type = "response")
#>          1          2          3 
#>  0.1818546 -0.2370397 -0.8704100 

## standardized Pearson residuals (additionally divide by standard deviation)
proresiduals(m, newdata = nd, type = "response")
#>          1          2          3 
#>  0.1818546 -0.2370397 -0.8704100 

## (non-random) mid-quantile residuals
proresiduals(m, newdata = nd, type = "quantile", random = FALSE)
#>           1           2           3 
#>  0.31008612 -0.07586646 -0.52976162 

## minimum/median/maximum quantile residuals
proresiduals(m, newdata = nd, type = "quantile", random = FALSE, prob = c(0, 0.5, 1))
#>             r_0       r_0.5        r_1
#> [1,] -0.1478027  0.31008612  0.8497044
#> [2,] -0.5526775 -0.07586646  0.3833860
#> [3,] -1.0191728 -0.52976162 -0.1453513

## compute residuals by manually obtaining distribution and response
## proresiduals(procast(m, newdata = nd, drop = TRUE), nd$goals)