Create an Empirical Distribution

Description

An empirical distribution consists of a series of N observations out of a typically unknown distribution, i.e., a random sample ‘X’.

Draws n random values from the empirical ensemble with replacement.

Please see the documentation of [Empirical()] for some properties of the empircal ensemble distribution, as well as extensive examples showing to how calculate p-values and confidence intervals.

Please see the documentation of [Empirical()] for some properties of the Empirical distribution, as well as extensive examples showing to how calculate p-values and confidence intervals. ‘quantile()’

TODO(RETO): Check description

Usage

Empirical(x)

pempirical(q, y, lower.tail = TRUE, log.p = FALSE, na.rm = TRUE)

dempirical(x, y, log = FALSE, method = "hist", ...)

qempirical(p, y, lower.tail = TRUE, log.p = FALSE, na.rm = TRUE, ...)

rempirical(n, y, na.rm = TRUE)

## S3 method for class 'Empirical'
mean(x, ...)

## S3 method for class 'Empirical'
variance(x, ...)

## S3 method for class 'Empirical'
skewness(x, type = 1L, ...)

## S3 method for class 'Empirical'
kurtosis(x, type = 3L, ...)

## S3 method for class 'Empirical'
random(x, n = 1L, drop = TRUE, ...)

## S3 method for class 'Empirical'
pdf(d, x, drop = TRUE, elementwise = NULL, ...)

## S3 method for class 'Empirical'
log_pdf(d, x, drop = TRUE, elementwise = NULL, ...)

## S3 method for class 'Empirical'
cdf(d, x, drop = TRUE, elementwise = NULL, ...)

## S3 method for class 'Empirical'
quantile(x, probs, drop = TRUE, elementwise = NULL, ...)

## S3 method for class 'Empirical'
support(d, drop = TRUE, ...)

Arguments

x A vector of elements whose cumulative probabilities you would like to determine given the distribution ‘d’.
q vector of quantiles.
y vector of observations of the empirical distribution with two or more non-missing finite values.
lower.tail logical; if TRUE (default), probabilities are P[X <= x] otherwise, P[X > x]. or “density”.
na.rm logical evaluating to TRUE or FALSE indicating whether NA values should be stripped before the computation proceeds.
log, log.p logical; if TRUE, probabilities p are given as log(p).
method character; the method to calculate the empirical density. Either “hist” (default)
Currently not used.
p vector of probabilities.
n The number of samples to draw. Defaults to ‘1L’.
type integer between 1L and 3L (default) selecting one of three algorithms. See Details for more information.
drop logical. Should the result be simplified to a vector if possible?
d An ‘Empirical’ object created by a call to [Empirical()].
elementwise logical. Should each distribution in x be evaluated at all elements of probs (elementwise = FALSE, yielding a matrix)? Or, if x and probs have the same length, should the evaluation be done element by element (elementwise = TRUE, yielding a vector)? The default of NULL means that elementwise = TRUE is used if the lengths match and otherwise elementwise = FALSE is used.
probs A vector of probabilities.

Details

The creation function [Empirical()] allows for a variety of different objects as main input x.

  • Vector: Assumes that the vector contains a series of observations from one empirical distribution.

  • List (named or unnamed) of vectors: Each element in the list describes one empirical distribution defined by the numeric values in each of the vectors.

  • Matrix or data.frame: Each row corresponds to one empirical distribution, whilst the columns contain the individual observations.

Missing values are allowed, however, each distribution requires at least two finite observations (-Inf/Inf is replaced by NA).

Support: \(R\), the set of all real numbers

Mean:

\(\bar{x} = \frac{1}{N} \sum_{i=1}^{N} x_i\)

Variance:

\(\frac{1}{N - 1} \sum_{i=1}^{N} (x_i - \bar{x})\)

Skewness:

\(S_1 = \sqrt{N} \frac{\sum_{i=1}^N (x_i - \bar{x})^3}{\sqrt{\big(\sum_{i=1}^N (x_i - \bar{x})^2\big)^3}}\)

\(S_2 = \frac{\sqrt{N * (N - 1)}}{(N - 2)} S_1\) (only defined for \(N > 2\))

\(S_3 = \sqrt{(1 - \frac{1}{N})^3} * S_1\) (default)

For more details about the different types of sample skewness see Joanes and Gill (1998).

Kurtosis:

\(K_1 = N * \frac{\sum_{i=1}^N (x_i - \bar{x})^4}{\big(\sum_{i=1}^N (x_i - \bar{x})^2\big)^2} - 3\)

\(K_2 = \frac{(N + 1) * K_1 + 6) * (N - 1)}{(N - 2) * (N - 3)}\) (only defined for \(N > 2\))

\(K_3 = \big(1 - \frac{1}{N}\big)^2 * (K_1 + 3) - 3\) (default)

For more details about the different types of sample kurtosis see Joanes and Gill (1998).

TODO(RETO): Add empirical distribution function information (step-function 1/N)

Probability density function (p.d.f):

This function returns the same values that you get from a Z-table. Note ‘quantile()’ is the inverse of ‘cdf()’. Please see the documentation of [Empirical()] for some properties of the Empirical distribution, as well as extensive examples showing to how calculate p-values and confidence intervals.

Value

An ‘Empirical’ object.

In case of a single distribution object or ‘n = 1’, either a numeric vector of length ‘n’ (if ‘drop = TRUE’, default) or a ‘matrix’ with ‘n’ columns (if ‘drop = FALSE’).

In case of a single distribution object, either a numeric vector of length ‘probs’ (if ‘drop = TRUE’, default) or a ‘matrix’ with ‘length(x)’ columns (if ‘drop = FALSE’). In case of a vectorized distribution object, a matrix with ‘length(x)’ columns containing all possible combinations.

In case of a single distribution object, either a numeric vector of length ‘probs’ (if ‘drop = TRUE’, default) or a ‘matrix’ with ‘length(x)’ columns (if ‘drop = FALSE’). In case of a vectorized distribution object, a matrix with ‘length(x)’ columns containing all possible combinations.

In case of a single distribution object, either a numeric vector of length ‘probs’ (if ‘drop = TRUE’, default) or a ‘matrix’ with ‘length(probs)’ columns (if ‘drop = FALSE’). In case of a vectorized distribution object, a matrix with ‘length(probs)’ columns containing all possible combinations.

In case of a single distribution object, a numeric vector of length 2 with the minimum and maximum value of the support (if ‘drop = TRUE’, default) or a ‘matrix’ with 2 columns. In case of a vectorized distribution object, a matrix with 2 columns containing all minima and maxima.

References

Joanes DN and Gill CA (1998). “Comparing Measures of Sample Skewness and Kurtosis.” Journal of the Royal Statistical Society D, 47(1), 183–189. doi:10.1111/1467-9884.00122

Examples

[1] "Empirical distribution (Min. -2.100, Max.  2.187, N = 50)"
mean(X)
[1] -0.09838857
[1] 1.076242
[1] 0.1027858
[1] -0.5339262
random(X, 10)
 [1]  0.62280108 -1.66020539 -0.06429479 -0.61645815  0.14298835 -1.85883315
 [7] -0.82054223 -1.66020539 -0.88294400 -0.43544484
pdf(X, 2)
[1] 0.04
log_pdf(X, 2)
[1] -3.218876
cdf(X, 4)
[1] 1
quantile(X, 0.7)
[1] 0.3600124
### example: allowed types/classes of input arguments

## Single vector (will be coerced to numeric)
Y1  <- rnorm(3, mean = -10)
d1 <- Empirical(Y1)
d1
[1] "Empirical distribution (Min. -10.70, Max.  -9.95, N = 3)"
mean(d1)
[1] -10.28573
## Unnamed list of vectors
Y2 <- list(as.character(rnorm(3, mean = -10)),
           runif(6),
           rpois(4, lambda = 15))
d2 <- Empirical(Y2)
d2
[1] "Empirical distribution (Min. -10.6917, Max.  -8.1584, N = 3)"
[2] "Empirical distribution (Min.   0.2365, Max.   0.8445, N = 6)"
[3] "Empirical distribution (Min.  13.0000, Max.  22.0000, N = 4)"
mean(d2)
[1] -9.7327191  0.5375046 17.5000000
## Named list of vectors
Y3 <- list("Normal"  = as.character(rnorm(3, mean = -10)),
           "Uniform" = runif(6),
           "Poisson" = rpois(4, lambda = 15))
d3 <- Empirical(Y3)
d3
                                                        Normal 
"Empirical distribution (Min. -11.1410, Max.  -8.4768, N = 3)" 
                                                       Uniform 
"Empirical distribution (Min.   0.1372, Max.   0.9940, N = 6)" 
                                                       Poisson 
"Empirical distribution (Min.  16.0000, Max.  22.0000, N = 4)" 
mean(d3)
     Normal     Uniform     Poisson 
-10.0322492   0.5316866  18.2500000 
## Matrix or data.frame
Y4 <- matrix(rnorm(20), ncol = 5, dimnames = list(sprintf("D_%d", 1:4), sprintf("obs_%d", 1:5)))
d4 <- Empirical(Y4)
d4
                                                         D_1 
"Empirical distribution (Min. -0.2841, Max.  1.0164, N = 5)" 
                                                         D_2 
"Empirical distribution (Min. -0.6239, Max.  1.1759, N = 5)" 
                                                         D_3 
"Empirical distribution (Min. -2.3085, Max.  1.7337, N = 5)" 
                                                         D_4 
"Empirical distribution (Min. -1.5264, Max.  2.4897, N = 5)" 
d5 <- Empirical(as.data.frame(Y4))
d5
                                                       obs_1 
"Empirical distribution (Min. -2.3085, Max. -0.2841, N = 4)" 
                                                       obs_2 
"Empirical distribution (Min. -1.1744, Max.  1.1759, N = 4)" 
                                                       obs_3 
"Empirical distribution (Min. -0.1304, Max.  2.4897, N = 4)" 
                                                       obs_4 
"Empirical distribution (Min. -1.5264, Max.  1.0164, N = 4)" 
                                                       obs_5 
"Empirical distribution (Min.  0.1128, Max.  1.6950, N = 4)"