Empirical.Rd
An empirical distribution consists of a series of N
observations
out of a typically unknown distribution, i.e., a random sample 'X'.
Draws n
random values from the empirical ensemble
with replacement.
Please see the documentation of [Empirical()] for some properties of the empircal ensemble distribution, as well as extensive examples showing to how calculate p-values and confidence intervals.
Please see the documentation of [Empirical()] for some properties of the Empirical distribution, as well as extensive examples showing to how calculate p-values and confidence intervals. `quantile()`
TODO(RETO): Check description
Empirical(x)
pempirical(q, y, lower.tail = TRUE, log.p = FALSE, na.rm = TRUE)
dempirical(x, y, log = FALSE, method = "hist", ...)
qempirical(p, y, lower.tail = TRUE, log.p = FALSE, na.rm = TRUE, ...)
rempirical(n, y, na.rm = TRUE)
# S3 method for Empirical
mean(x, ...)
# S3 method for Empirical
variance(x, ...)
# S3 method for Empirical
skewness(x, type = 1L, ...)
# S3 method for Empirical
kurtosis(x, type = 3L, ...)
# S3 method for Empirical
random(x, n = 1L, drop = TRUE, ...)
# S3 method for Empirical
pdf(d, x, drop = TRUE, elementwise = NULL, ...)
# S3 method for Empirical
log_pdf(d, x, drop = TRUE, elementwise = NULL, ...)
# S3 method for Empirical
cdf(d, x, drop = TRUE, elementwise = NULL, ...)
# S3 method for Empirical
quantile(x, probs, drop = TRUE, elementwise = NULL, ...)
# S3 method for Empirical
support(d, drop = TRUE, ...)
A vector of elements whose cumulative probabilities you would like to determine given the distribution `d`.
vector of quantiles.
vector of observations of the empirical distribution with two or more non-missing finite values.
logical; if TRUE
(default), probabilities are
P[X <= x]
otherwise, P[X > x]
.
or "density"
.
logical evaluating to TRUE
or FALSE
indicating whether
NA
values should be stripped before the computation
proceeds.
logical; if TRUE
, probabilities p
are given as log(p)
.
character; the method to calculate the empirical density. Either "hist"
(default)
Currently not used.
vector of probabilities.
The number of samples to draw. Defaults to `1L`.
integer between 1L
and 3L
(default) selecting one of three
algorithms. See Details for more information.
logical. Should the result be simplified to a vector if possible?
An `Empirical` object created by a call to [Empirical()].
logical. Should each distribution in x
be evaluated
at all elements of probs
(elementwise = FALSE
, yielding a matrix)?
Or, if x
and probs
have the same length, should the evaluation be
done element by element (elementwise = TRUE
, yielding a vector)? The
default of NULL
means that elementwise = TRUE
is used if the
lengths match and otherwise elementwise = FALSE
is used.
A vector of probabilities.
An `Empirical` object.
In case of a single distribution object or `n = 1`, either a numeric vector of length `n` (if `drop = TRUE`, default) or a `matrix` with `n` columns (if `drop = FALSE`).
In case of a single distribution object, either a numeric vector of length `probs` (if `drop = TRUE`, default) or a `matrix` with `length(x)` columns (if `drop = FALSE`). In case of a vectorized distribution object, a matrix with `length(x)` columns containing all possible combinations.
In case of a single distribution object, either a numeric vector of length `probs` (if `drop = TRUE`, default) or a `matrix` with `length(x)` columns (if `drop = FALSE`). In case of a vectorized distribution object, a matrix with `length(x)` columns containing all possible combinations.
In case of a single distribution object, either a numeric vector of length `probs` (if `drop = TRUE`, default) or a `matrix` with `length(probs)` columns (if `drop = FALSE`). In case of a vectorized distribution object, a matrix with `length(probs)` columns containing all possible combinations.
In case of a single distribution object, a numeric vector of length 2 with the minimum and maximum value of the support (if `drop = TRUE`, default) or a `matrix` with 2 columns. In case of a vectorized distribution object, a matrix with 2 columns containing all minima and maxima.
The creation function [Empirical()] allows for a variety of different objects
as main input x
.
* Vector: Assumes that the vector contains a series of observations from one empirical distribution.
* List (named or unnamed) of vectors: Each element in the list describes one empirical distribution defined by the numeric values in each of the vectors.
* Matrix or data.frame: Each row corresponds to one empirical distribution, whilst the columns contain the individual observations.
Missing values are allowed, however, each distribution requires at least two
finite observations (-Inf
/Inf
is replaced by NA
).
**Support**: \(R\), the set of all real numbers
**Mean**: $$\bar{x} = \frac{1}{N} \sum_{i=1}^{N} x_i$$
**Variance**: $$\frac{1}{N - 1} \sum_{i=1}^{N} (x_i - \bar{x})$$
**Skewness**:
\(S_1 = \sqrt{N} \frac{\sum_{i=1}^N (x_i - \bar{x})^3}{\sqrt{\big(\sum_{i=1}^N (x_i - \bar{x})^2\big)^3}}\)
\(S_2 = \frac{\sqrt{N * (N - 1)}}{(N - 2)} S_1\) (only defined for \(N > 2\))
\(S_3 = \sqrt{(1 - \frac{1}{N})^3} * S_1\) (default)
For more details about the different types of sample skewness see Joanes and Gill (1998).
**Kurtosis**:
\(K_1 = N * \frac{\sum_{i=1}^N (x_i - \bar{x})^4}{\big(\sum_{i=1}^N (x_i - \bar{x})^2\big)^2} - 3\)
\(K_2 = \frac{(N + 1) * K_1 + 6) * (N - 1)}{(N - 2) * (N - 3)}\) (only defined for \(N > 2\))
\(K_3 = \big(1 - \frac{1}{N}\big)^2 * (K_1 + 3) - 3\) (default)
For more details about the different types of sample kurtosis see Joanes and Gill (1998).
**TODO(RETO)**: Add empirical distribution function information (step-function 1/N)
**Probability density function (p.d.f)**:
This function returns the same values that you get from a Z-table. Note `quantile()` is the inverse of `cdf()`. Please see the documentation of [Empirical()] for some properties of the Empirical distribution, as well as extensive examples showing to how calculate p-values and confidence intervals.
Joanes DN and Gill CA (1998). “Comparing Measures of Sample Skewness and Kurtosis.” Journal of the Royal Statistical Society D, 47(1), 183--189. doi:10.1111/1467-9884.00122
require("distributions3")
#> Loading required package: distributions3
#>
#> Attaching package: ‘distributions3’
#> The following object is masked from ‘package:stats’:
#>
#> Gamma
#> The following object is masked from ‘package:grDevices’:
#>
#> pdf
set.seed(28)
X <- Empirical(rnorm(50))
X
#> [1] "Empirical distribution (Min. -2.100, Max. 2.187, N = 50)"
mean(X)
#> [1] -0.09838857
variance(X)
#> [1] 1.076242
skewness(X)
#> [1] 0.1027858
kurtosis(X)
#> [1] -0.5339262
random(X, 10)
#> [1] 0.62280108 -1.66020539 -0.06429479 -0.61645815 0.14298835 -1.85883315
#> [7] -0.82054223 -1.66020539 -0.88294400 -0.43544484
pdf(X, 2)
#> [1] 0.04
log_pdf(X, 2)
#> [1] -3.218876
cdf(X, 4)
#> [1] 1
quantile(X, 0.7)
#> [1] 0.3600124
### example: allowed types/classes of input arguments
## Single vector (will be coerced to numeric)
Y1 <- rnorm(3, mean = -10)
d1 <- Empirical(Y1)
d1
#> [1] "Empirical distribution (Min. -10.70, Max. -9.95, N = 3)"
mean(d1)
#> [1] -10.28573
## Unnamed list of vectors
Y2 <- list(as.character(rnorm(3, mean = -10)),
runif(6),
rpois(4, lambda = 15))
d2 <- Empirical(Y2)
d2
#> [1] "Empirical distribution (Min. -10.6917, Max. -8.1584, N = 3)"
#> [2] "Empirical distribution (Min. 0.2365, Max. 0.8445, N = 6)"
#> [3] "Empirical distribution (Min. 13.0000, Max. 22.0000, N = 4)"
mean(d2)
#> [1] -9.7327191 0.5375046 17.5000000
## Named list of vectors
Y3 <- list("Normal" = as.character(rnorm(3, mean = -10)),
"Uniform" = runif(6),
"Poisson" = rpois(4, lambda = 15))
d3 <- Empirical(Y3)
d3
#> Normal
#> "Empirical distribution (Min. -11.1410, Max. -8.4768, N = 3)"
#> Uniform
#> "Empirical distribution (Min. 0.1372, Max. 0.9940, N = 6)"
#> Poisson
#> "Empirical distribution (Min. 16.0000, Max. 22.0000, N = 4)"
mean(d3)
#> Normal Uniform Poisson
#> -10.0322492 0.5316866 18.2500000
## Matrix or data.frame
Y4 <- matrix(rnorm(20), ncol = 5, dimnames = list(sprintf("D_%d", 1:4), sprintf("obs_%d", 1:5)))
d4 <- Empirical(Y4)
d4
#> D_1
#> "Empirical distribution (Min. -0.2841, Max. 1.0164, N = 5)"
#> D_2
#> "Empirical distribution (Min. -0.6239, Max. 1.1759, N = 5)"
#> D_3
#> "Empirical distribution (Min. -2.3085, Max. 1.7337, N = 5)"
#> D_4
#> "Empirical distribution (Min. -1.5264, Max. 2.4897, N = 5)"
d5 <- Empirical(as.data.frame(Y4))
d5
#> obs_1
#> "Empirical distribution (Min. -2.3085, Max. -0.2841, N = 4)"
#> obs_2
#> "Empirical distribution (Min. -1.1744, Max. 1.1759, N = 4)"
#> obs_3
#> "Empirical distribution (Min. -0.1304, Max. 2.4897, N = 4)"
#> obs_4
#> "Empirical distribution (Min. -1.5264, Max. 1.0164, N = 4)"
#> obs_5
#> "Empirical distribution (Min. 0.1128, Max. 1.6950, N = 4)"