topmodels

Residuals for Probabilistic Regression Models

Description

Generic function and default method for (randomized) quantile residuals, PIT, Pearson, and raw response residuals based on distributions3 support.

Usage

proresiduals(object, ...)

## Default S3 method:
proresiduals(
  object,
  newdata = NULL,
  type = c("quantile", "pit", "pearson", "response"),
  nsim = NULL,
  prob = NULL,
  delta = NULL,
  ...
)

Arguments

`object`	an object for which a `newresponse` and a `prodist` method is available.
`…`	further parameters passed to methods.
`newdata`	optionally, a data frame in which to look for variables with which to predict. If omitted, the original observations are used.
`type`	character indicating whether quantile (default), PIT, Pearson, or raw response residuals should be computed.
`nsim`	integer. The number of randomly simulated residuals of `type = “quantile”` or `“pit”`. By default one simulation is returned.
`prob`	numeric. Instead of simulating the probabilities (between 0 and 1) for `type = “quantile”` or `“pit”`, a vector of probabilities can be specified, e.g., `prob = 0.5` corresponding to mid-quantile residuals.
`delta`	numeric. The minimal difference to compute the range of proabilities corresponding to each observation according to get (randomized) `“quantile”` or `“pit”` residuals. For `NULL`, the minimal observed difference in the resonse divided by `5e-6` is used. Ignored for continuous distributions.

Details

The new generic function proresiduals comes with a powerful default method that is based on the following idea: newresponse and prodist can be used to extract the observed response and expected distribution for it, respectively. For all model classes that have methods for these two generic functions, proresiduals can compute a range of different types of residuals.

The simplest definition of residuals are the so-called “response” residuals which simply compute the difference between the observations and the expected means. The “pearson” residuals additionally standardize these residuals by the square root of the expected variance. Thus, these residuals are based only on the first and on the first two moments, respectively.

To assess the entire distribution and not just the first moments, there are also residuals based on the probability integral transform (PIT). For regression models with a continuous response distribution, “pit” residuals (see Warton 2007) are simply the expected cumulative distribution (CDF) evaluated at the observations (Dawid, 1984). For discrete distributions, a uniform random value is drawn from the range of probabilities between the CDF at the observation and the supremum of the CDF to the left of it. If the model fits well the PIT residuals should be uniformly distributed.

In order to obtain normally distributed residuals for well-fitting models (like often desired in linear regression models), “quantile” residuals, proposed by Dunn and Smyth (1996), additionally transform the PIT residuals by the standard normal quantile function.

As quantile residuals and PIT residuals are subject to randomness for discrete distributions (and also for mixed discrete-continuous distributions), it is sometimes useful to explore the extent of the random variation. This can be done either by obtaining multiple replications (via nsim) or by computing fixed quantiles of each probability interval such as prob = 0.5 (corresponding to mid-quantile residuals, see Feng et al. 2020). Another common setting is prob = c(0, 1) yielding the range of possible residuals.

Value

A vector or matrix of residuals. A matrix of residuals is returned if more than one replication of quantile or PIT residuals is computed, i.e., if either random > 1 or random = FALSE and length(prob) > 1.

References

Dawid AP (1984). “Present Position and Potential Developments: Some Personal Views: Statistical Theory: The Prequential Approach.” Journal of the Royal Statistical Society A, 147(2), 278–292. doi:10.2307/2981683.

Dunn KP, Smyth GK (1996). “Randomized Quantile Residuals.” Journal of Computational and Graphical Statistics, 5(3), 236–244. doi:10.2307/1390802

Feng C, Li L, Sadeghpour A (2020). “A Comparison of Residual Diagnosis Tools for Diagnosing Regression Models for Count Data” BMC Medical Research Methodology, 20(175), 1–21. doi:10.1186/s12874-020-01055-2

Warton DI, Thibaut L, Wang YA (2017) “The PIT-Trap – A ‘Model-Free’ Bootstrap Procedure for Inference about Regression Models with Discrete, Multivariate Responses”. PLOS ONE, 12(7), 1–18. doi:10.1371/journal.pone.0181790.

Examples

library("topmodels")

## Poisson GLM for FIFA 2018 data
data("FIFA2018", package = "distributions3")
m <- glm(goals ~ difference, data = FIFA2018, family = poisson)

## random quantile residuals (on original data)
proresiduals(m)

           1            2            3            4            5            6 
 1.915644338 -1.105078153 -0.807958596 -0.179311742  1.266353539  0.470575512 
           7            8            9           10           11           12 
-0.752494263 -1.045225233  1.282869075 -1.124877627  1.090942428 -0.539823528 
          13           14           15           16           17           18 
-0.643705716  0.289872127  1.664027076  1.075729933 -0.264916471 -0.771184981 
          19           20           21           22           23           24 
-2.437170350 -1.045453555  0.011902281 -0.068450565  1.088857023 -0.298712135 
          25           26           27           28           29           30 
-0.092781656  0.040978929 -0.854314048  0.153547831 -0.274131009 -0.247002209 
          31           32           33           34           35           36 
-0.817571789 -0.522624791 -0.612034420 -1.036527316 -0.775181710  0.624317835 
          37           38           39           40           41           42 
-0.780984001  0.637865299  0.341489250 -0.851326997 -1.155778583  1.409362082 
          43           44           45           46           47           48 
 0.885634944 -0.966831001  0.625367246 -0.032738602  0.451783240  0.424961257 
          49           50           51           52           53           54 
-1.871463209 -0.079238236 -0.444718221  0.265211278 -0.161201871 -1.901862143 
          55           56           57           58           59           60 
 0.352212934  0.955084300 -1.403623094 -0.173022861  0.558675802  1.224009110 
          61           62           63           64           65           66 
-1.442910607  0.287048754 -0.248590445 -1.075140445 -0.102006275  0.735432718 
          67           68           69           70           71           72 
-0.148785401  0.428322470  1.497000447 -1.584051642 -1.040486363  1.439931999 
          73           74           75           76           77           78 
 0.478389817 -2.074754753 -0.040001057 -0.076802385  1.852219154  1.726367467 
          79           80           81           82           83           84 
 2.273379199 -0.094434871 -0.728706706 -0.074710978  0.006688404  1.008577520 
          85           86           87           88           89           90 
-0.601928161  0.760779782  0.173040854  1.064166915  0.736167373  0.600592817 
          91           92           93           94           95           96 
-0.596536809  1.252235278 -0.795438505 -0.067279907 -1.703060032 -0.414581548 
          97           98           99          100          101          102 
 1.713871282  1.235274189  1.019891801  0.015391816 -0.169494635  0.649544807 
         103          104          105          106          107          108 
-0.079701603  0.545956271  0.409284332 -0.650807985  1.100409421  0.967389502 
         109          110          111          112          113          114 
 0.246228476 -1.204117946 -0.239730292  0.181577362 -0.377170649  0.063799316 
         115          116          117          118          119          120 
-0.293110417  1.011912968 -0.343094697  0.176102847  0.230919606 -0.177637379 
         121          122          123          124          125          126 
-0.318423938 -0.924017495 -0.071730364 -0.104286530  0.492498400 -1.929644217 
         127          128 
 1.515017065  0.697940283

## various flavors of residuals on small new data
nd <- data.frame(goals = c(1, 1, 1), difference = c(-1, 0, 1))

## quantile residuals: random (1 sample), random (5 samples), mid-quantile (non-random)
proresiduals(m, newdata = nd, type = "quantile")

          1           2           3 
 0.25103847 -0.02596762 -0.51383725

proresiduals(m, newdata = nd, type = "quantile", nsim = 5)

            r_1        r_2         r_3        r_4          r_5
[1,] -0.1459758  0.2499621  0.31995611  0.1707342  0.006673901
[2,] -0.2282231 -0.2831749 -0.07696133 -0.2840809  0.171462757
[3,] -0.5027931 -0.4081734 -0.39487938 -0.3886358 -0.542828894

proresiduals(m, newdata = nd, type = "quantile", prob = 0.5)

          1           2           3 
 0.31008612 -0.07586646 -0.52976162

## PIT residuals (without transformation to normal): random vs. minimum/maximum quantile
proresiduals(m, newdata = nd, type = "pit", nsim = 5)

           r_1       r_2       r_3       r_4       r_5
[1,] 0.5650710 0.7654889 0.6667167 0.5448863 0.6624393
[2,] 0.3700829 0.6131877 0.4344202 0.3212836 0.3359922
[3,] 0.2552849 0.3120397 0.2365261 0.1657959 0.4127608

proresiduals(m, newdata = nd, type = "pit", prob = c(0, 1))

           r_0       r_1
[1,] 0.4412492 0.8022553
[2,] 0.2902421 0.6492832
[3,] 0.1540605 0.4422167

## raw response residuals (observation - expected mean)
proresiduals(m, newdata = nd, type = "response")

         1          2          3 
 0.1818546 -0.2370397 -0.8704100

## standardized Pearson residuals (response residuals divided by standard deviation)
proresiduals(m, newdata = nd, type = "pearson")

         1          2          3 
 0.2010523 -0.2131225 -0.6364371

## compute residuals by manually obtaining distribution and response
## proresiduals(procast(m, newdata = nd, drop = TRUE), nd$goals)