procast.Rd
Generic function and methods for computing various kinds of probabilistic forecasts from (regression) models.
procast(
object,
newdata = NULL,
na.action = na.pass,
type = "distribution",
at = 0.5,
drop = FALSE,
...
)
# S3 method for default
procast(
object,
newdata = NULL,
na.action = na.pass,
type = c("distribution", "mean", "variance", "quantile", "probability", "density",
"loglikelihood", "parameters", "kurtosis", "skewness"),
at = 0.5,
drop = FALSE,
...
)
# S3 method for lm
procast(
object,
newdata = NULL,
na.action = na.pass,
type = "distribution",
at = 0.5,
drop = FALSE,
...,
sigma = "ML"
)
# S3 method for glm
procast(
object,
newdata = NULL,
na.action = na.pass,
type = "distribution",
at = 0.5,
drop = FALSE,
...,
dispersion = NULL
)
# S3 method for bamlss
procast(
object,
newdata = NULL,
na.action = na.pass,
type = "distribution",
at = 0.5,
drop = FALSE,
...,
distributions3 = FALSE
)
# S3 method for disttree
procast(
object,
newdata = NULL,
na.action = na.pass,
type = "distribution",
at = 0.5,
drop = FALSE,
...,
distributions3 = FALSE
)
a fitted model object. For the default
method this
needs to have a prodist
method (or object
can inherit from distribution
directly).
optionally, a data frame in which to look for variables with which to predict. If omitted, the original observations are used.
function determining what should be done with missing
values in newdata
. The default is to employ NA
.
character specifying the type of probabilistic forecast to
compute. Note that type = "probability"
corresponds to cumulative
probability as in pnorm
, pbinom
, etc.
specification of values at which the forecasts should be
evaluated, typically a numeric vector but possibly also a matrix or data
frame. Additionally, at
can be the character string
"function"
or "list"
, see details below.
logical. Should forecasts be returned in a data frame (default) or (if possible) dropped to a vector, see return value description below.
further parameters passed to methods. In particular, this includes
the logical argument elementwise = NULL
. Should each element of distribution only be evaluated at the
corresponding element of at
(elementwise = TRUE
) or at all elements
in at
(elementwise = FALSE
). Elementwise evaluation is only possible
if the number of observations is length of at
are the same and in that case a vector of
the same length is returned. Otherwise a matrix is returned. The default is to use
elementwise = TRUE
if possible, and otherwise elementwise = FALSE
.
character or numeric or NULL
. Specification of the standard
deviation sigma
to be used for the Normal
distribution in the
lm
method. The default "ML"
(or equivalently "MLE"
or NULL
)
uses the maximum likelihood estimate based on the residual sum of squares divided
by the number of observations, n. Alternatively, sigma = "OLS"
uses the
least-squares estimate (divided by the residual degrees of freedom, n - k). Finally,
a concrete numeric value can also be specified in sigma
.
character or numeric or NULL
. Specification of the
dispersion parameter in the glm
method. The default NULL
(or equivalently "deviance"
) is to use the deviance
divided by the number of observations, n. Alternatively, dispersion = "Chisquared"
uses the Chi-squared statistic divided by the residual degrees of freedom, n - k.
Finally, a concrete numeric value can also be specified in dispersion
.
logical. If a dedicated distributions3 object
is available (e.g., such as Normal
) and uses
the same parameterization, should this be used instead of the general
disttree
distribution?
Either a data.frame
of predictions with the same number of rows
as the newdata
(or the original observations if that is NULL
).
If drop = TRUE
predictions with just a single column are simplified
to a vector and predictions with multiple columns to a matrix.
The function procast
provides a unified framework for probabilistic
forcasting (or procasting, for short) based on probabilistic (regression)
models, also known as distributional regression approaches. Typical types
of predictions include quantiles, probabilities, (conditional) expectations,
variances, and (log-)densities. Internally, procast
methods typically
compute the predicted parameters for each observation and then compute the
desired outcome for the distributions with the respective parameters.
Some quantities, e.g., the moments of the distribution (like mean or variance),
can be computed directly from the predicted parameters of the
distribution while others require an additional argument at
which the
distribution is evaluated (e.g., the probability of a quantile or an
observation of the response).
The default procast
method leverages the S3 classes and methods for
probability distributions from the distributions3 package. In a first step
the predicted probability distribution object is obtained and, by default
(type = "distribution"
), returned in order to reflect the distributional
nature of the forecast. For all other type
s (e.g., "mean"
,
"quantile"
, or "density"
), the corresponding extractor methods
(e.g., mean
, quantile
, or pdf
) are used to
compute the desired quantity from the distribution objects. The examples
provide some worked illustrations.
Package authors or users, who want to enable procast
for new types
of model objects, only need to provide a suitable prodist
extractor for the predicted probability distribution. Then the default procast
works out of the box. However, if the distributions3 package does not support
the necessary probability distribution, then it may also be necessary to
implement a new distribution objects, see apply_dpqr
.
## load packages
library("topmodels")
library("distributions3")
## Poisson regression model for FIFA 2018 data:
## number of goals scored by each team in each game, explained by
## predicted ability difference of the competing teams
data("FIFA2018", package = "distributions3")
m <- glm(goals ~ difference, data = FIFA2018, family = poisson)
## predicted probability distributions for all matches (in sample)
head(procast(m))
#> distribution
#> 1 Poisson distribution (lambda = 1.7680273)
#> 2 Poisson distribution (lambda = 0.8655224)
#> 3 Poisson distribution (lambda = 1.0296663)
#> 4 Poisson distribution (lambda = 1.4861779)
#> 5 Poisson distribution (lambda = 1.4353952)
#> 6 Poisson distribution (lambda = 1.0660948)
head(procast(m, drop = TRUE))
#> 1
#> "Poisson distribution (lambda = 1.7680)"
#> 2
#> "Poisson distribution (lambda = 0.8655)"
#> 3
#> "Poisson distribution (lambda = 1.0297)"
#> 4
#> "Poisson distribution (lambda = 1.4862)"
#> 5
#> "Poisson distribution (lambda = 1.4354)"
#> 6
#> "Poisson distribution (lambda = 1.0661)"
## procasts for new data
## much lower, equal, and much higher ability than opponent
nd <- data.frame(difference = c(-1, 0, 1))
## predicted goal distribution object
goals <- procast(m, newdata = nd, drop = TRUE)
goals
#> 1
#> "Poisson distribution (lambda = 0.8181)"
#> 2
#> "Poisson distribution (lambda = 1.2370)"
#> 3
#> "Poisson distribution (lambda = 1.8704)"
## predicted densities/probabilities for scoring 0, 1, ..., 5 goals
procast(m, newdata = nd, type = "density", at = 0:5)
#> d_0 d_1 d_2 d_3 d_4 d_5
#> 1 0.4412492 0.3610060 0.1476777 0.04027394 0.008237485 0.001347892
#> 2 0.2902421 0.3590411 0.2220740 0.09157147 0.028319386 0.007006441
#> 3 0.1540605 0.2881563 0.2694852 0.16801593 0.078564672 0.029389630
## by hand
pdf(goals, 0:5)
#> d_0 d_1 d_2 d_3 d_4 d_5
#> 1 0.4412492 0.3610060 0.1476777 0.04027394 0.008237485 0.001347892
#> 2 0.2902421 0.3590411 0.2220740 0.09157147 0.028319386 0.007006441
#> 3 0.1540605 0.2881563 0.2694852 0.16801593 0.078564672 0.029389630
## means and medians
procast(m, newdata = nd, type = "mean")
#> mean
#> 1 0.8181454
#> 2 1.2370397
#> 3 1.8704100
procast(m, newdata = nd, type = "quantile", at = 0.5)
#> quantile
#> 1 1
#> 2 1
#> 3 2
## by hand
mean(goals)
#> 1 2 3
#> 0.8181454 1.2370397 1.8704100
quantile(goals, 0.5)
#> 1 2 3
#> 1 1 2
## evaluate procast elementwise or for all possible combinations
## of distributions from 'nd' and observations in 'at'
procast(m, newdata = nd, type = "probability", at = 1:3, elementwise = TRUE)
#> probability
#> 1 0.8022553
#> 2 0.8713572
#> 3 0.8797179
procast(m, newdata = nd, type = "probability", at = 1:3, elementwise = FALSE)
#> p_1 p_2 p_3
#> 1 0.8022553 0.9499330 0.9902069
#> 2 0.6492832 0.8713572 0.9629287
#> 3 0.4422167 0.7117019 0.8797179
## compute in-sample log-likelihood sum via procast
sum(procast(m, type = "density", at = FIFA2018$goals, log = TRUE))
#> [1] -177.6971
logLik(m)
#> 'log Lik.' -177.6971 (df=2)