Usage
crps.distribution(
y,
x,
drop = TRUE,
elementwise = NULL,
gridsize = 500L,
batchsize = 10000L,
applyfun = NULL,
cores = NULL,
method = NULL,
...
)
crps.Beta(y, x, drop = TRUE, elementwise = NULL, ...)
crps.Bernoulli(y, x, drop = TRUE, elementwise = NULL, ...)
crps.Binomial(y, x, drop = TRUE, elementwise = NULL, ...)
crps.Erlang(y, x, drop = TRUE, elementwise = NULL, ...)
crps.Exponential(y, x, drop = TRUE, elementwise = NULL, ...)
crps.Gamma(y, x, drop = TRUE, elementwise = NULL, ...)
crps.GEV(y, x, drop = TRUE, elementwise = NULL, ...)
crps.Geometric(y, x, drop = TRUE, elementwise = NULL, ...)
crps.Gumbel(y, x, drop = TRUE, elementwise = NULL, ...)
crps.HyperGeometric(y, x, drop = TRUE, elementwise = NULL, ...)
crps.Logistic(y, x, drop = TRUE, elementwise = NULL, ...)
crps.LogNormal(y, x, drop = TRUE, elementwise = NULL, ...)
crps.NegativeBinomial(y, x, drop = TRUE, elementwise = NULL, ...)
crps.Normal(y, x, drop = TRUE, elementwise = NULL, ...)
crps.Poisson(y, x, drop = TRUE, elementwise = NULL, ...)
crps.StudentsT(y, x, drop = TRUE, elementwise = NULL, ...)
crps.Uniform(y, x, drop = TRUE, elementwise = NULL, ...)
crps.XBetaX(y, x, drop = TRUE, elementwise = NULL, method = "cdf", ...)
crps.GAMLSS(y, x, drop = TRUE, elementwise = NULL, ...)
crps.BAMLSS(y, x, drop = TRUE, elementwise = NULL, ...)
Details
The (continuous) ranked probability score (CRPS) for (univariate) probability distributions can be computed based on the the object-oriented infrastructure provided by the distributions3 package. The general crps.distribution
method does so by using numeric integration based on the cdf
and/or quantile
methods (for more details see below). Additionally, if dedicated closed-form CRPS computations are provided by the scoringRules package for the specified distribution, then these are used because they are both computationally faster and numerically more precise. For example, the crps
method for Normal
objects leverages crps_norm
rather than relying on numeric integration.
The general method for any distribution
object uses the following strategy for numerical CRPS computation. By default (if the method
argument is NULL
), it distinguishes distributions whose entire support is continuous, or whose entire support is discrete, or mixed discrete-continuous distribution using is_continuous
and is_discrete
, respectively.
For continuous and mixed distributions, an equidistant grid of gridsize + 5
probabilities is drawn for which the corresponding quantile
s for each distribution y
are calculated (including the observation x
). The calculation of the CRPS then uses a trapezoidal approximation for the numeric integration. For discrete distributions, gridsize
equidistant quantiles (in steps of 1) are drawn and the corresponding probabilities from the cdf
are calculated for each distribution y
(including the observation x
) and the CRPS calculated using numeric integration. If the gridsize
in steps of 1 is not sufficient to cover the required range, the method falls back to the procedure used for continuous and mixed distributions to approximate the CRPS.
If the method
argument is set to either “cdf”
or “quantile”
, then the specific strategy for setting up the grid of observations and corresponding probabilities can be enforced. This can be useful if for a certain distribution class, only a cdf
or only a quantile
method is available or only one of them is numerically stable or computationally efficient etc.
The numeric approximation requires to set up a matrix of dimension length(y) * (gridsize + 5)
(or length(y) * (gridsize + 1)
) which may be very memory intensive if length(y)
and/or gridsize
are large. Thus, the data is split batches of (approximately) equal size, not larger than batchsize
. Thus, the memory requirement is reduced to batchsize * (gridsize + 5)
in each step. Hence, a smaller value of batchsize
will reduce memory footprint but will slightly increase computation time.
The error (deviation between numerical approximation and analytic solution) has been shown to be in the order of 1e-2
for a series of distributions tested. Accuracy can be increased by increasing gridsize
and will be lower for a smaller gridsize
.
For parallelization of the numeric computations, a suitable applyfun
can be provided that carries out the integration for each element of y
. To facilitate setting up a suitable applyfun
using the basic parallel package, the argument cores
is provided for convenience. When used, y
is split into B
equidistant batches; at least B = cores
batches or a multiple of cores
with a maximum size of batchsize
. On systems running Windows parlapply
is used, else mclapply
.