proresiduals.Rd
Generic function and default method for (randomized) quantile residuals, PIT, Pearson, and raw response residuals based on distributions3 support.
proresiduals(object, ...)
# S3 method for default
proresiduals(
object,
newdata = NULL,
type = c("quantile", "pit", "pearson", "response"),
random = TRUE,
prob = NULL,
delta = NULL,
...
)
an object for which a newresponse
and a
prodist
method is available.
further parameters passed to methods.
optionally, a data frame in which to look for variables with which to predict. If omitted, the original observations are used.
character indicating whether quantile (default), PIT, Pearson, or raw response residuals should be computed.
logical or numeric. Should random residuals be computed for type = "quantile"
and "pit"
? The default is TRUE
and if set to FALSE
, fixed
quantiles given at the probabilities in prob
are used (defaulting to mid-quantiles).
If random > 1
, then multiple random replications of quantile or PIT
residuals are computed. For other residual types random
has no effect.
numeric. Fixed probabilities for the quantile or PIT residuals when
random = FALSE
.
numeric. The minimal difference to compute the range of
proabilities corresponding to each observation according to get (randomized)
"quantile"
or "pit"
residuals. For NULL
, the minimal observed difference in the
resonse divided by 5e-6
is used. Ignored for continuous distributions.
A vector or matrix of residuals. A matrix of residuals is returned
if more than one replication of quantile or PIT residuals is computed, i.e., if either
random > 1
or random = FALSE
and length(prob) > 1
.
The new generic function proresiduals
comes with a powerful default
method that is based on the following idea: newresponse
and prodist
can be used to extract the observed
response and expected distribution for it, respectively. For all model classes
that have methods for these two generic functions, proresiduals
can
compute a range of different type
s of residuals.
The simplest definition of residuals are the so-called "response"
residuals
which simply compute the difference between the observations and the expected means.
The "pearson"
residuals additionally standardize these residuals by the
square root of the expected variance. Thus, these residuals are based only on the
first and on the first two moments, respectively.
To assess the entire distribution and not just the first moments, there are also
residuals based on the probability integral transform (PIT).
For regression models with a continuous response distribution, "pit"
residuals
(see Warton 2007) are simply the expected cumulative distribution (CDF) evaluated at the
observations (Dawid, 1984). For discrete distributions, a uniform random value is drawn
from the range of probabilities between the CDF at the observation and the supremum
of the CDF to the left of it. If the model fits well the PIT residuals should be uniformly
distributed.
In order to obtain normally distributed residuals for well-fitting models (like often
desired in linear regression models), "quantile"
residuals, proposed by Dunn and
Smyth (1996), additionally transform the PIT residuals by the standard normal quantile function.
As quantile residuals and PIT residuals are subject to randomness for discrete distributions
(and also for mixed discrete-continuous distributions), it is sometimes
useful to explore the extent of the random variation by obtaining multiple replications.
In proresiduals
this can be achieved by setting random > 1
.
Alternatively, the randomness can be suppressed via random = FALSE
and then only
one (or more) fixed quantile(s) of each probability interval is returned. The default is
prob = 0.5
which corresponds to mid-quantile residuals (see Feng et al. 2020). Another
common setting is prob = c(0, 1)
which yields the range of possible residuals.
Dawid AP (1984). “Present Position and Potential Developments: Some Personal Views: Statistical Theory: The Prequential Approach.” Journal of the Royal Statistical Society A, 147(2), 278--292. doi:10.2307/2981683 .
Dunn KP, Smyth GK (1996). “Randomized Quantile Residuals.” Journal of Computational and Graphical Statistics, 5(3), 236--244. doi:10.2307/1390802
Feng C, Li L, Sadeghpour A (2020). “A Comparison of Residual Diagnosis Tools for Diagnosing Regression Models for Count Data” BMC Medical Research Methodology, 20(175), 1--21. doi:10.1186/s12874-020-01055-2
Warton DI, Thibaut L, Wang YA (2017) “The PIT-Trap -- A ‘Model-Free’ Bootstrap Procedure for Inference about Regression Models with Discrete, Multivariate Responses”. PLOS ONE, 12(7), 1--18. doi:10.1371/journal.pone.0181790 .
## Poisson GLM for FIFA 2018 data
data("FIFA2018", package = "distributions3")
m <- glm(goals ~ difference, data = FIFA2018, family = poisson)
## random quantile residuals
proresiduals(m)
#> 1 2 3 4 5 6
#> 2.121391343 -0.870829887 -0.409313713 -0.398996087 1.279323226 0.423852923
#> 7 8 9 10 11 12
#> -0.721350510 -1.003574008 1.102676448 -0.546669720 0.962804005 -0.316228552
#> 13 14 15 16 17 18
#> -0.739981117 0.273003421 2.063776099 1.428015088 -0.453951606 -1.610287017
#> 19 20 21 22 23 24
#> -1.411194908 -0.934977924 -0.058239246 -0.821002708 1.627328789 -0.046773847
#> 25 26 27 28 29 30
#> 0.343772467 0.481905086 -0.869549229 -0.564616712 0.139767620 0.509241780
#> 31 32 33 34 35 36
#> -0.485255437 -1.375145617 -0.454371204 -1.949937093 -1.577362085 0.644553913
#> 37 38 39 40 41 42
#> -0.542332525 0.575688850 0.773677653 -2.234759032 -0.926803673 1.442863660
#> 43 44 45 46 47 48
#> 0.985674715 -1.264427978 0.466815688 0.303144930 -0.127912344 0.446301302
#> 49 50 51 52 53 54
#> -0.780019834 0.090552169 -0.722048020 0.246295435 -0.037847805 -0.897378928
#> 55 56 57 58 59 60
#> -0.392254615 0.631771062 -1.222177367 0.090525677 0.377167382 1.094367969
#> 61 62 63 64 65 66
#> -2.599488279 0.381903891 -0.054256968 -1.246356958 -0.137317322 0.779801188
#> 67 68 69 70 71 72
#> 0.315399157 0.042437121 1.683986030 -3.062763544 -1.490724714 1.764524040
#> 73 74 75 76 77 78
#> 0.556259169 -0.254498696 0.559267677 -0.074324329 1.630638756 1.399866470
#> 79 80 81 82 83 84
#> 2.211817402 0.331909157 -1.853638892 -0.373712781 -0.022654464 0.447119239
#> 85 86 87 88 89 90
#> 0.021875629 1.112887630 0.076203550 0.613008422 0.466150146 0.355965917
#> 91 92 93 94 95 96
#> -1.331674057 1.372869120 -0.653998253 0.122785883 -1.212792623 -0.203224183
#> 97 98 99 100 101 102
#> 1.808338824 1.411707437 0.812780327 0.146983853 -0.060088635 0.593423621
#> 103 104 105 106 107 108
#> -0.495442467 0.144175587 0.447625347 -0.598954092 0.926853890 1.357653473
#> 109 110 111 112 113 114
#> -0.268520589 -1.051176079 -0.331889615 -0.312948506 -0.792573244 0.316483384
#> 115 116 117 118 119 120
#> 0.125058985 0.863613275 -0.548998622 0.722704203 0.221340782 0.174326845
#> 121 122 123 124 125 126
#> -0.120542079 -1.018568723 -0.344973548 -0.003119137 0.351795087 -0.628584783
#> 127 128
#> 1.810441882 1.150576905
## Pearson residuals
proresiduals(m, type = "pearson")
#> 1 2 3 4 5 6
#> 2.430654322 -0.930334590 -1.014724759 -0.398804262 1.305927102 -0.064013188
#> 7 8 9 10 11 12
#> -0.613884092 -0.914301665 1.519090649 -1.093055385 0.993556817 -0.424264684
#> 13 14 15 16 17 18
#> -1.113056199 -0.211616188 2.038308437 1.153169320 -0.546750414 -0.944255416
#> 19 20 21 22 23 24
#> -0.840278169 -0.792913544 0.116280482 -0.548303440 1.537474481 -0.111528206
#> 25 26 27 28 29 30
#> -0.099738912 0.339830526 -1.079060716 -0.274111521 -0.360546190 -0.066833526
#> 31 32 33 34 35 36
#> -0.691375999 -0.881237871 -0.908320460 -1.361897916 -1.065751519 0.562346952
#> 37 38 39 40 41 42
#> -0.625851688 0.190909297 0.375971301 -0.998686051 -1.222716781 1.953550708
#> 43 44 45 46 47 48
#> 0.688071124 -1.113023052 0.192354231 0.107004865 0.001191279 0.378023421
#> 49 50 51 52 53 54
#> -1.048037832 -0.333124464 -0.702612840 0.264228995 -0.223024138 -0.808461816
#> 55 56 57 58 59 60
#> -0.170029883 0.623708821 -0.857974497 -0.054672753 0.452546135 0.923900344
#> 61 62 63 64 65 66
#> -1.398279331 0.245655957 -0.333908002 -1.047632914 -0.051316840 0.453072234
#> 67 68 69 70 71 72
#> -0.027001956 0.287781642 1.634605655 -1.515806191 -1.135687571 1.664963346
#> 73 74 75 76 77 78
#> 0.627910298 -0.855301681 0.204392856 0.090052537 2.096199105 1.427808319
#> 79 80 81 82 83 84
#> 2.910552914 0.239342145 -1.071915077 -0.287530123 -0.178220460 0.635530844
#> 85 86 87 88 89 90
#> -0.438382284 1.014266304 -0.339381957 0.869416029 0.747455937 0.624804291
#> 91 92 93 94 95 96
#> -1.082035785 1.480840871 -1.023025797 -0.382201924 -1.016451750 -0.395336915
#> 97 98 99 100 101 102
#> 2.319462548 1.720582549 0.734690903 -0.246750527 -0.518360463 0.087311027
#> 103 104 105 106 107 108
#> -0.359633135 -0.067732640 0.018629138 -0.880499408 0.854954441 1.281307943
#> 109 110 111 112 113 114
#> -0.180142894 -1.130625614 -0.111669780 -0.315120066 -0.972503579 0.300292121
#> 115 116 117 118 119 120
#> -0.390918669 0.944711115 -0.976915794 0.313170650 -0.169708261 -0.256636287
#> 121 122 123 124 125 126
#> -0.332373025 -1.048426333 -0.144779744 -0.281712175 0.578986159 -1.071915077
#> 127 128
#> 1.891273576 1.071258286
## various flavors of residuals on small new data
nd <- data.frame(goals = c(1, 1, 1), difference = c(-1, 0, 1))
## random quantile residuals
set.seed(0)
proresiduals(m, newdata = nd, type = "quantile")
#> 1 2 3
#> 0.7223565 -0.2908823 -0.6393725
set.seed(0)
proresiduals(m, newdata = nd, type = "quantile", random = 5)
#> r_1 r_2 r_3 r_4 r_5
#> [1,] 0.7223565 0.3800686 0.7243461 0.4353980 0.01250164
#> [2,] -0.2908823 0.2958457 0.3303159 -0.4889856 0.09265594
#> [3,] -0.6393725 -0.7988924 -0.4002845 -0.7946336 -0.62879276
## underlying probability integral transform (PIT) without transformation to normal
set.seed(0)
proresiduals(m, newdata = nd, type = "pit", random = 5)
#> r_1 r_2 r_3 r_4 r_5
#> [1,] 0.7649623 0.6480528 0.7655733 0.6683632 0.5049873
#> [2,] 0.3855707 0.6163260 0.6294193 0.3124259 0.5369116
#> [3,] 0.2612903 0.2121764 0.3444735 0.2134133 0.2647424
## raw response residuals (observation - expected mean)
proresiduals(m, newdata = nd, type = "response")
#> 1 2 3
#> 0.1818546 -0.2370397 -0.8704100
## standardized Pearson residuals (additionally divide by standard deviation)
proresiduals(m, newdata = nd, type = "response")
#> 1 2 3
#> 0.1818546 -0.2370397 -0.8704100
## (non-random) mid-quantile residuals
proresiduals(m, newdata = nd, type = "quantile", random = FALSE)
#> 1 2 3
#> 0.31008612 -0.07586646 -0.52976162
## minimum/median/maximum quantile residuals
proresiduals(m, newdata = nd, type = "quantile", random = FALSE, prob = c(0, 0.5, 1))
#> r_0 r_0.5 r_1
#> [1,] -0.1478027 0.31008612 0.8497044
#> [2,] -0.5526775 -0.07586646 0.3833860
#> [3,] -1.0191728 -0.52976162 -0.1453513
## compute residuals by manually obtaining distribution and response
## proresiduals(procast(m, newdata = nd, drop = TRUE), nd$goals)