Scoring Probabilistic Forecasts — proscore • topmodels

Generic function and default method for computing various kinds of scores for fitted or predicted probability distributions from (regression) models.

proscore(object, newdata = NULL, ...)

# S3 method for default
proscore(
  object,
  newdata = NULL,
  na.action = na.pass,
  type = c("logs", "crps"),
  aggregate = TRUE,
  drop = FALSE,
  ...
)

Arguments

object: a fitted model object. For the default method this needs to have a prodist and a newresponse method.
newdata: optionally, a data frame in which to look for variables with which to predict and from which to obtain the response variable. If omitted, the original observations are used.
...: further parameters passed to the aggregate function (if any).
na.action: function determining what should be done with missing values in newdata. The default is to employ NA.
type: character specifying the type of score to compute. Avaible types: "logs" (or equivalently "log-score"), "loglikelihood" (or equivalently "log_pdf"), "CRPS" (or equivalently "RPS"), "MAE", "MSE", "DSS" (or equivalently "Dawid-Sebastiani"). Upper or lower case spellings can be used interchangably, hyphens or underscores can be included or omitted. Setting type = NULL yields all available scores.
aggregate: logical or function to be used for aggregating scores across observations. Setting aggregate = TRUE (the default) corresponds to using mean.
drop: logical. Should scores be returned in a data frame (default) or (if possible) dropped to a vector?

Value

Either a data.frame of scores (if drop = FALSE, default) or a named numeric vector (if drop = TRUE and the scores are not a matrix). The names are the type specified by the user (i.e., are not canonicalized by partial matching etc.).

Details

The function proscore provides a unified framework for scoring probabilistic forecasts (in-sample or out-of-sample). The following scores are currently available, using the following notation: $Y$ is the predicted random variable with cumulative distribution function $F(\cdot)$ and probability density function $f(\cdot)$. The corresponding expectation and variance are denoted by $E(Y)$ and $V(Y)$. The actual observation is $y$.

Log-score: Also known as logarithmic score. This is the negative log-likelihood where the negative sign has the effect that smaller values indicate a better fit.

$$ -\log f(y) $$

Log-likelihood: Also known as log-density. Clearly, this is equivalent to the log-score above but using the conventional sign where bigger values indicate a better fit.

$$ \log f(y) $$

Continuous ranked probability score (CRPS):

$$ \int_{-\infty}^{\infty} \left( F(x) - 1(x \ge y) \right)^2 \mbox{d} x $$

where $1(\cdot)$ denotes the indicator function.

In case of a discrete rather than a continuous distribution, the ranked probability score (RPS) is defined analogously using the sum rather than the integral. In other words it is then the sum of the squared deviations between the predicted cumulative probabilities $F(x)$ and the ideal step function for the actual observation $y$.

Mean absolute error (MAE):

$$ \left| y - E(Y) \right| $$

Mean squared error (MSE):

$$ \left( y - E(Y) \right)^2 $$

Dawid-Sebastiani score (DSS):

$$ \frac{\left( y - E(Y) \right)^2}{V(Y)} + \log(V(Y)) $$

Internally, the default proscore method first computes the fitted/predicted probability distribution object using prodist (corresponding to $Y$ above) and then obtains the corresponding observation $y$ using newresponse. Subsequently, the scores are evaluated using either the log_pdf method, crps method, or simply the mean. Finally, the resulting individual scores per observation can be returned as a full data frame, or aggregated (e.g., by using mean, sum, or summary, etc.).

Examples

## Poisson regression model for FIFA 2018 data:
## number of goals scored by each team in each game, explained by
## predicted ability difference of the competing teams
data("FIFA2018", package = "distributions3")
m <- glm(goals ~ difference, data = FIFA2018, family = poisson)

## default: in-sample mean log-score and CRPS
proscore(m)
#>       logs      crps
#> 1 1.388258 0.5619936

## element-wise score using a new data set (final of the tournament)
final <- tail(FIFA2018, 2)
proscore(m, newdata = final, aggregate = FALSE)
#>         logs      crps
#> 127 2.891437 1.7744030
#> 128 1.741564 0.7205361

## replicate in-sample log-likelihood
proscore(m, type = "loglik", aggregate = sum)
#>      loglik
#> 1 -177.6971
logLik(m)
#> 'log Lik.' -177.6971 (df=2)

## compute mean of all available scores
proscore(m, type = NULL)
#>       logs loglikelihood      crps       mae      mse      dss
#> 1 1.388258     -1.388258 0.5619936 0.8320441 1.162032 1.085419

## upper vs. lower case spelling is matched internally but preserved in output
proscore(m, type = c("logs", "crps"))
#>       logs      crps
#> 1 1.388258 0.5619936
proscore(m, type = c("Log-score", "CRPS"))
#>   Log-score      CRPS
#> 1  1.388258 0.5619936

## least-squares regression for speed and breaking distance of cars
data("cars", package = "datasets")
m <- lm(dist ~ speed, data = cars)

## replicate in-sample log-likelihood and residual sum of squares
## (aka deviance) by taking the sum (rather than the mean) of the
## log-density and squared errors, respectively
proscore(m, type = c("loglik", "MSE"), aggregate = sum)
#>      loglik      MSE
#> 1 -206.5784 11353.52
logLik(m)
#> 'log Lik.' -206.5784 (df=3)
deviance(m)
#> [1] 11353.52