Procast: Probabilistic Forecasting — procast • topmodels

Generic function and methods for computing various kinds of probabilistic forecasts from (regression) models.

procast(
  object,
  newdata = NULL,
  na.action = na.pass,
  type = "distribution",
  at = 0.5,
  drop = FALSE,
  ...
)

# S3 method for default
procast(
  object,
  newdata = NULL,
  na.action = na.pass,
  type = c("distribution", "mean", "variance", "quantile", "probability", "density",
    "loglikelihood", "parameters", "kurtosis", "skewness"),
  at = 0.5,
  drop = FALSE,
  ...
)

# S3 method for lm
procast(
  object,
  newdata = NULL,
  na.action = na.pass,
  type = "distribution",
  at = 0.5,
  drop = FALSE,
  ...,
  sigma = "ML"
)

# S3 method for glm
procast(
  object,
  newdata = NULL,
  na.action = na.pass,
  type = "distribution",
  at = 0.5,
  drop = FALSE,
  ...,
  dispersion = NULL
)

# S3 method for bamlss
procast(
  object,
  newdata = NULL,
  na.action = na.pass,
  type = "distribution",
  at = 0.5,
  drop = FALSE,
  ...,
  distributions3 = FALSE
)

# S3 method for disttree
procast(
  object,
  newdata = NULL,
  na.action = na.pass,
  type = "distribution",
  at = 0.5,
  drop = FALSE,
  ...,
  distributions3 = FALSE
)

Arguments

object: a fitted model object. For the default method this needs to have a prodist method (or object can inherit from distribution directly).
newdata: optionally, a data frame in which to look for variables with which to predict. If omitted, the original observations are used.
na.action: function determining what should be done with missing values in newdata. The default is to employ NA.
type: character specifying the type of probabilistic forecast to compute. Note that type = "probability" corresponds to cumulative probability as in pnorm, pbinom, etc.
at: specification of values at which the forecasts should be evaluated, typically a numeric vector but possibly also a matrix or data frame. Additionally, at can be the character string "function" or "list", see details below.
drop: logical. Should forecasts be returned in a data frame (default) or (if possible) dropped to a vector, see return value description below.
...: further parameters passed to methods. In particular, this includes the logical argument elementwise = NULL. Should each element of distribution only be evaluated at the corresponding element of at (elementwise = TRUE) or at all elements in at (elementwise = FALSE). Elementwise evaluation is only possible if the number of observations is length of at are the same and in that case a vector of the same length is returned. Otherwise a matrix is returned. The default is to use elementwise = TRUE if possible, and otherwise elementwise = FALSE.
sigma: character or numeric or NULL. Specification of the standard deviation sigma to be used for the Normal distribution in the lm method. The default "ML" (or equivalently "MLE" or NULL) uses the maximum likelihood estimate based on the residual sum of squares divided by the number of observations, n. Alternatively, sigma = "OLS" uses the least-squares estimate (divided by the residual degrees of freedom, n - k). Finally, a concrete numeric value can also be specified in sigma.
dispersion: character or numeric or NULL. Specification of the dispersion parameter in the glm method. The default NULL (or equivalently "deviance") is to use the deviance divided by the number of observations, n. Alternatively, dispersion = "Chisquared" uses the Chi-squared statistic divided by the residual degrees of freedom, n - k. Finally, a concrete numeric value can also be specified in dispersion.
distributions3: logical. If a dedicated distributions3 object is available (e.g., such as Normal) and uses the same parameterization, should this be used instead of the general disttree distribution?

Value

Either a data.frame of predictions with the same number of rows as the newdata (or the original observations if that is NULL). If drop = TRUE predictions with just a single column are simplified to a vector and predictions with multiple columns to a matrix.

Details

The function procast provides a unified framework for probabilistic forcasting (or procasting, for short) based on probabilistic (regression) models, also known as distributional regression approaches. Typical types of predictions include quantiles, probabilities, (conditional) expectations, variances, and (log-)densities. Internally, procast methods typically compute the predicted parameters for each observation and then compute the desired outcome for the distributions with the respective parameters.

Some quantities, e.g., the moments of the distribution (like mean or variance), can be computed directly from the predicted parameters of the distribution while others require an additional argument at which the distribution is evaluated (e.g., the probability of a quantile or an observation of the response).

The default procast method leverages the S3 classes and methods for probability distributions from the distributions3 package. In a first step the predicted probability distribution object is obtained and, by default (type = "distribution"), returned in order to reflect the distributional nature of the forecast. For all other types (e.g., "mean", "quantile", or "density"), the corresponding extractor methods (e.g., mean, quantile, or pdf) are used to compute the desired quantity from the distribution objects. The examples provide some worked illustrations.

Package authors or users, who want to enable procast for new types of model objects, only need to provide a suitable prodist extractor for the predicted probability distribution. Then the default procast works out of the box. However, if the distributions3 package does not support the necessary probability distribution, then it may also be necessary to implement a new distribution objects, see apply_dpqr.

Examples

## load packages
library("topmodels")
library("distributions3")

## Poisson regression model for FIFA 2018 data:
## number of goals scored by each team in each game, explained by
## predicted ability difference of the competing teams
data("FIFA2018", package = "distributions3")
m <- glm(goals ~ difference, data = FIFA2018, family = poisson)

## predicted probability distributions for all matches (in sample)
head(procast(m))
#>                                distribution
#> 1 Poisson distribution (lambda = 1.7680273)
#> 2 Poisson distribution (lambda = 0.8655224)
#> 3 Poisson distribution (lambda = 1.0296663)
#> 4 Poisson distribution (lambda = 1.4861779)
#> 5 Poisson distribution (lambda = 1.4353952)
#> 6 Poisson distribution (lambda = 1.0660948)
head(procast(m, drop = TRUE))
#>                                        1 
#> "Poisson distribution (lambda = 1.7680)" 
#>                                        2 
#> "Poisson distribution (lambda = 0.8655)" 
#>                                        3 
#> "Poisson distribution (lambda = 1.0297)" 
#>                                        4 
#> "Poisson distribution (lambda = 1.4862)" 
#>                                        5 
#> "Poisson distribution (lambda = 1.4354)" 
#>                                        6 
#> "Poisson distribution (lambda = 1.0661)" 

## procasts for new data
## much lower, equal, and much higher ability than opponent
nd <- data.frame(difference = c(-1, 0, 1))

## predicted goal distribution object
goals <- procast(m, newdata = nd, drop = TRUE)
goals
#>                                        1 
#> "Poisson distribution (lambda = 0.8181)" 
#>                                        2 
#> "Poisson distribution (lambda = 1.2370)" 
#>                                        3 
#> "Poisson distribution (lambda = 1.8704)" 

## predicted densities/probabilities for scoring 0, 1, ..., 5 goals
procast(m, newdata = nd, type = "density", at = 0:5)
#>         d_0       d_1       d_2        d_3         d_4         d_5
#> 1 0.4412492 0.3610060 0.1476777 0.04027394 0.008237485 0.001347892
#> 2 0.2902421 0.3590411 0.2220740 0.09157147 0.028319386 0.007006441
#> 3 0.1540605 0.2881563 0.2694852 0.16801593 0.078564672 0.029389630
## by hand
pdf(goals, 0:5)
#>         d_0       d_1       d_2        d_3         d_4         d_5
#> 1 0.4412492 0.3610060 0.1476777 0.04027394 0.008237485 0.001347892
#> 2 0.2902421 0.3590411 0.2220740 0.09157147 0.028319386 0.007006441
#> 3 0.1540605 0.2881563 0.2694852 0.16801593 0.078564672 0.029389630

## means and medians
procast(m, newdata = nd, type = "mean")
#>        mean
#> 1 0.8181454
#> 2 1.2370397
#> 3 1.8704100
procast(m, newdata = nd, type = "quantile", at = 0.5)
#>   quantile
#> 1        1
#> 2        1
#> 3        2
## by hand
mean(goals)
#>         1         2         3 
#> 0.8181454 1.2370397 1.8704100 
quantile(goals, 0.5)
#> 1 2 3 
#> 1 1 2 

## evaluate procast elementwise or for all possible combinations
## of distributions from 'nd' and observations in 'at'
procast(m, newdata = nd, type = "probability", at = 1:3, elementwise = TRUE)
#>   probability
#> 1   0.8022553
#> 2   0.8713572
#> 3   0.8797179
procast(m, newdata = nd, type = "probability", at = 1:3, elementwise = FALSE)
#>         p_1       p_2       p_3
#> 1 0.8022553 0.9499330 0.9902069
#> 2 0.6492832 0.8713572 0.9629287
#> 3 0.4422167 0.7117019 0.8797179

## compute in-sample log-likelihood sum via procast
sum(procast(m, type = "density", at = FIFA2018$goals, log = TRUE))
#> [1] -177.6971
logLik(m)
#> 'log Lik.' -177.6971 (df=2)