Residuals for Probabilistic Regression Models

Description

Generic function and default method for (randomized) quantile residuals, PIT, Pearson, and raw response residuals based on distributions3 support.

Usage

proresiduals(object, ...)

## Default S3 method:
proresiduals(
  object,
  newdata = NULL,
  type = c("quantile", "pit", "pearson", "response"),
  nsim = NULL,
  prob = NULL,
  delta = NULL,
  ...
)

Arguments

object an object for which a newresponse and a prodist method is available.
further parameters passed to methods.
newdata optionally, a data frame in which to look for variables with which to predict. If omitted, the original observations are used.
type character indicating whether quantile (default), PIT, Pearson, or raw response residuals should be computed.
nsim integer. The number of randomly simulated residuals of type = “quantile” or “pit”. By default one simulation is returned.
prob numeric. Instead of simulating the probabilities (between 0 and 1) for type = “quantile” or “pit”, a vector of probabilities can be specified, e.g., prob = 0.5 corresponding to mid-quantile residuals.
delta numeric. The minimal difference to compute the range of proabilities corresponding to each observation according to get (randomized) “quantile” or “pit” residuals. For NULL, the minimal observed difference in the resonse divided by 5e-6 is used. Ignored for continuous distributions.

Details

The new generic function proresiduals comes with a powerful default method that is based on the following idea: newresponse and prodist can be used to extract the observed response and expected distribution for it, respectively. For all model classes that have methods for these two generic functions, proresiduals can compute a range of different types of residuals.

The simplest definition of residuals are the so-called “response” residuals which simply compute the difference between the observations and the expected means. The “pearson” residuals additionally standardize these residuals by the square root of the expected variance. Thus, these residuals are based only on the first and on the first two moments, respectively.

To assess the entire distribution and not just the first moments, there are also residuals based on the probability integral transform (PIT). For regression models with a continuous response distribution, “pit” residuals (see Warton 2007) are simply the expected cumulative distribution (CDF) evaluated at the observations (Dawid, 1984). For discrete distributions, a uniform random value is drawn from the range of probabilities between the CDF at the observation and the supremum of the CDF to the left of it. If the model fits well the PIT residuals should be uniformly distributed.

In order to obtain normally distributed residuals for well-fitting models (like often desired in linear regression models), “quantile” residuals, proposed by Dunn and Smyth (1996), additionally transform the PIT residuals by the standard normal quantile function.

As quantile residuals and PIT residuals are subject to randomness for discrete distributions (and also for mixed discrete-continuous distributions), it is sometimes useful to explore the extent of the random variation. This can be done either by obtaining multiple replications (via nsim) or by computing fixed quantiles of each probability interval such as prob = 0.5 (corresponding to mid-quantile residuals, see Feng et al. 2020). Another common setting is prob = c(0, 1) yielding the range of possible residuals.

Value

A vector or matrix of residuals. A matrix of residuals is returned if more than one replication of quantile or PIT residuals is computed, i.e., if either random > 1 or random = FALSE and length(prob) > 1.

References

Dawid AP (1984). “Present Position and Potential Developments: Some Personal Views: Statistical Theory: The Prequential Approach.” Journal of the Royal Statistical Society A, 147(2), 278–292. doi:10.2307/2981683.

Dunn KP, Smyth GK (1996). “Randomized Quantile Residuals.” Journal of Computational and Graphical Statistics, 5(3), 236–244. doi:10.2307/1390802

Feng C, Li L, Sadeghpour A (2020). “A Comparison of Residual Diagnosis Tools for Diagnosing Regression Models for Count Data” BMC Medical Research Methodology, 20(175), 1–21. doi:10.1186/s12874-020-01055-2

Warton DI, Thibaut L, Wang YA (2017) “The PIT-Trap – A ‘Model-Free’ Bootstrap Procedure for Inference about Regression Models with Discrete, Multivariate Responses”. PLOS ONE, 12(7), 1–18. doi:10.1371/journal.pone.0181790.

See Also

qnorm, qqrplot

Examples

library("topmodels")

## Poisson GLM for FIFA 2018 data
data("FIFA2018", package = "distributions3")
m <- glm(goals ~ difference, data = FIFA2018, family = poisson)

## random quantile residuals (on original data)
proresiduals(m)
           1            2            3            4            5            6 
 2.318279308 -0.439931985 -0.635993592 -0.228517939  1.330527083  0.119408070 
           7            8            9           10           11           12 
-0.691318098 -1.261475234  1.124774783 -1.135364668  0.854345941 -0.287203588 
          13           14           15           16           17           18 
-1.399154818 -0.537545526  1.673296348  1.357837429 -0.266163653 -1.017095274 
          19           20           21           22           23           24 
-0.299647076 -1.097565534  0.200564160 -0.492341182  1.451596357 -0.280079627 
          25           26           27           28           29           30 
 0.029813079  0.162864349 -0.541493790 -0.022916654 -0.110272407  0.472344302 
          31           32           33           34           35           36 
-0.655038103 -0.239935047 -0.590378265 -1.368867509 -0.981775647  0.654456294 
          37           38           39           40           41           42 
-0.341237660  0.241671459  0.390024291 -1.430509899 -1.324529903  1.806258033 
          43           44           45           46           47           48 
 0.526121929 -0.866904934  0.691701969  0.277048344  0.003968664  0.538077181 
          49           50           51           52           53           54 
-0.607479271  0.204036555 -0.288478316  0.418802618  0.055265025 -2.376580687 
          55           56           57           58           59           60 
 0.241669307  0.412564619 -0.982275787  0.372820416  0.399655286  0.609147364 
          61           62           63           64           65           66 
-1.563183304  0.233149815  0.154015780 -1.309397229  0.004820310  0.366256689 
          67           68           69           70           71           72 
-0.142231828  0.098483468  1.500628982 -2.431065324 -0.971027143  1.722132733 
          73           74           75           76           77           78 
 0.600758550 -0.703870635  0.007808108  0.387749309  1.659018660  1.084594729 
          79           80           81           82           83           84 
 2.211095304 -0.105437518 -1.609917554 -0.568191736 -0.239958229  0.900466604 
          85           86           87           88           89           90 
-0.284024653  1.267558961 -0.055485926  0.685955773  0.560048454  0.723589880 
          91           92           93           94           95           96 
-2.278037098  1.122593637 -0.559633824 -0.280754871 -0.440682995 -0.325560998 
          97           98           99          100          101          102 
 1.716262817  1.829753025  0.838321759 -0.241908939 -0.676976786  0.162228314 
         103          104          105          106          107          108 
 0.011839452  0.428791284  0.047707767 -1.092360288  0.844860351  0.873321161 
         109          110          111          112          113          114 
-0.090481912 -1.401157372  0.368627520  0.253662417 -2.178782142  0.481688254 
         115          116          117          118          119          120 
-0.174440541  0.756226623 -1.321670808  0.648270922 -0.219040241 -0.244000497 
         121          122          123          124          125          126 
-0.482504430 -1.766159153  0.355420094 -0.485690358  0.383893837 -0.564370527 
         127          128 
 1.727234532  1.358035641 
## various flavors of residuals on small new data
nd <- data.frame(goals = c(1, 1, 1), difference = c(-1, 0, 1))

## quantile residuals: random (1 sample), random (5 samples), mid-quantile (non-random)
proresiduals(m, newdata = nd, type = "quantile")
          1           2           3 
 0.08206473  0.16794331 -0.79489784 
proresiduals(m, newdata = nd, type = "quantile", nsim = 5)
             r_1        r_2        r_3        r_4        r_5
[1,]  0.04045295  0.6490735  0.4708528  0.3654437  0.1385858
[2,] -0.30342326 -0.3126504 -0.5265108  0.2468159  0.2474505
[3,] -0.67560752 -0.5234960 -0.9314298 -0.2693369 -0.3895383
proresiduals(m, newdata = nd, type = "quantile", prob = 0.5)
          1           2           3 
 0.31008612 -0.07586646 -0.52976162 
## PIT residuals (without transformation to normal): random vs. minimum/maximum quantile
proresiduals(m, newdata = nd, type = "pit", nsim = 5)
           r_1       r_2       r_3       r_4       r_5
[1,] 0.7852379 0.5106289 0.5752134 0.5817532 0.6255251
[2,] 0.4736438 0.5205103 0.3377620 0.3443584 0.5022730
[3,] 0.2387543 0.2392821 0.2178502 0.1982118 0.3296159
proresiduals(m, newdata = nd, type = "pit", prob = c(0, 1))
           r_0       r_1
[1,] 0.4412492 0.8022553
[2,] 0.2902421 0.6492832
[3,] 0.1540605 0.4422167
## raw response residuals (observation - expected mean)
proresiduals(m, newdata = nd, type = "response")
         1          2          3 
 0.1818546 -0.2370397 -0.8704100 
## standardized Pearson residuals (response residuals divided by standard deviation)
proresiduals(m, newdata = nd, type = "pearson")
         1          2          3 
 0.2010523 -0.2131225 -0.6364371 
## compute residuals by manually obtaining distribution and response
## proresiduals(procast(m, newdata = nd, drop = TRUE), nd$goals)