topmodels

Create an Empirical Distribution

Description

An empirical distribution consists of a series of N observations out of a typically unknown distribution, i.e., a random sample ‘X’.

Draws n random values from the empirical ensemble with replacement.

Please see the documentation of [Empirical()] for some properties of the empircal ensemble distribution, as well as extensive examples showing to how calculate p-values and confidence intervals.

Please see the documentation of [Empirical()] for some properties of the Empirical distribution, as well as extensive examples showing to how calculate p-values and confidence intervals. ‘quantile()’

TODO(RETO): Check description

Usage

Empirical(x)

pempirical(q, y, lower.tail = TRUE, log.p = FALSE, na.rm = TRUE)

dempirical(x, y, log = FALSE, method = "hist", ...)

qempirical(p, y, lower.tail = TRUE, log.p = FALSE, na.rm = TRUE, ...)

rempirical(n, y, na.rm = TRUE)

## S3 method for class 'Empirical'
mean(x, ...)

## S3 method for class 'Empirical'
variance(x, ...)

## S3 method for class 'Empirical'
skewness(x, type = 1L, ...)

## S3 method for class 'Empirical'
kurtosis(x, type = 3L, ...)

## S3 method for class 'Empirical'
random(x, n = 1L, drop = TRUE, ...)

## S3 method for class 'Empirical'
pdf(d, x, drop = TRUE, elementwise = NULL, ...)

## S3 method for class 'Empirical'
log_pdf(d, x, drop = TRUE, elementwise = NULL, ...)

## S3 method for class 'Empirical'
cdf(d, x, drop = TRUE, elementwise = NULL, ...)

## S3 method for class 'Empirical'
quantile(x, probs, drop = TRUE, elementwise = NULL, ...)

## S3 method for class 'Empirical'
support(d, drop = TRUE, ...)

Arguments

`x`	A vector of elements whose cumulative probabilities you would like to determine given the distribution ‘d’.
`q`	vector of quantiles.
`y`	vector of observations of the empirical distribution with two or more non-missing finite values.
`lower.tail`	logical; if `TRUE` (default), probabilities are `P[X <= x]` otherwise, `P[X > x]`. or `“density”`.
`na.rm`	logical evaluating to `TRUE` or `FALSE` indicating whether `NA` values should be stripped before the computation proceeds.
`log`, `log.p`	logical; if `TRUE`, probabilities `p` are given as `log(p)`.
`method`	character; the method to calculate the empirical density. Either `“hist”` (default)
`…`	Currently not used.
`p`	vector of probabilities.
`n`	The number of samples to draw. Defaults to ‘1L’.
`type`	integer between `1L` and `3L` (default) selecting one of three algorithms. See Details for more information.
`drop`	logical. Should the result be simplified to a vector if possible?
`d`	An ‘Empirical’ object created by a call to [Empirical()].
`elementwise`	logical. Should each distribution in `x` be evaluated at all elements of `probs` (`elementwise = FALSE`, yielding a matrix)? Or, if `x` and `probs` have the same length, should the evaluation be done element by element (`elementwise = TRUE`, yielding a vector)? The default of `NULL` means that `elementwise = TRUE` is used if the lengths match and otherwise `elementwise = FALSE` is used.
`probs`	A vector of probabilities.

Details

The creation function [Empirical()] allows for a variety of different objects as main input x.

Vector: Assumes that the vector contains a series of observations from one empirical distribution.
List (named or unnamed) of vectors: Each element in the list describes one empirical distribution defined by the numeric values in each of the vectors.
Matrix or data.frame: Each row corresponds to one empirical distribution, whilst the columns contain the individual observations.

Missing values are allowed, however, each distribution requires at least two finite observations (-Inf/Inf is replaced by NA).

Support: \(R\), the set of all real numbers

Mean:

\(\bar{x} = \frac{1}{N} \sum_{i=1}^{N} x_i\)

Variance:

\(\frac{1}{N - 1} \sum_{i=1}^{N} (x_i - \bar{x})\)

Skewness:

\(S_1 = \sqrt{N} \frac{\sum_{i=1}^N (x_i - \bar{x})^3}{\sqrt{\big(\sum_{i=1}^N (x_i - \bar{x})^2\big)^3}}\)

\(S_2 = \frac{\sqrt{N * (N - 1)}}{(N - 2)} S_1\) (only defined for \(N > 2\))

\(S_3 = \sqrt{(1 - \frac{1}{N})^3} * S_1\) (default)

For more details about the different types of sample skewness see Joanes and Gill (1998).

Kurtosis:

\(K_1 = N * \frac{\sum_{i=1}^N (x_i - \bar{x})^4}{\big(\sum_{i=1}^N (x_i - \bar{x})^2\big)^2} - 3\)

\(K_2 = \frac{(N + 1) * K_1 + 6) * (N - 1)}{(N - 2) * (N - 3)}\) (only defined for \(N > 2\))

\(K_3 = \big(1 - \frac{1}{N}\big)^2 * (K_1 + 3) - 3\) (default)

For more details about the different types of sample kurtosis see Joanes and Gill (1998).

TODO(RETO): Add empirical distribution function information (step-function 1/N)

Probability density function (p.d.f):

This function returns the same values that you get from a Z-table. Note ‘quantile()’ is the inverse of ‘cdf()’. Please see the documentation of [Empirical()] for some properties of the Empirical distribution, as well as extensive examples showing to how calculate p-values and confidence intervals.

Value

An ‘Empirical’ object.

In case of a single distribution object or ‘n = 1’, either a numeric vector of length ‘n’ (if ‘drop = TRUE’, default) or a ‘matrix’ with ‘n’ columns (if ‘drop = FALSE’).

In case of a single distribution object, either a numeric vector of length ‘probs’ (if ‘drop = TRUE’, default) or a ‘matrix’ with ‘length(x)’ columns (if ‘drop = FALSE’). In case of a vectorized distribution object, a matrix with ‘length(x)’ columns containing all possible combinations.

In case of a single distribution object, either a numeric vector of length ‘probs’ (if ‘drop = TRUE’, default) or a ‘matrix’ with ‘length(probs)’ columns (if ‘drop = FALSE’). In case of a vectorized distribution object, a matrix with ‘length(probs)’ columns containing all possible combinations.

In case of a single distribution object, a numeric vector of length 2 with the minimum and maximum value of the support (if ‘drop = TRUE’, default) or a ‘matrix’ with 2 columns. In case of a vectorized distribution object, a matrix with 2 columns containing all minima and maxima.

References

Joanes DN and Gill CA (1998). “Comparing Measures of Sample Skewness and Kurtosis.” Journal of the Royal Statistical Society D, 47(1), 183–189. doi:10.1111/1467-9884.00122

Examples

library("topmodels")


require("distributions3")
set.seed(28)

X <- Empirical(rnorm(50))
X

[1] "Empirical distribution (Min. -2.100, Max.  2.187, N = 50)"

mean(X)

[1] -0.09838857

variance(X)

[1] 1.076242

skewness(X)

[1] 0.1027858

kurtosis(X)

[1] -0.5339262

random(X, 10)

 [1]  0.62280108 -1.66020539 -0.06429479 -0.61645815  0.14298835 -1.85883315
 [7] -0.82054223 -1.66020539 -0.88294400 -0.43544484

pdf(X, 2)

[1] 0.04

log_pdf(X, 2)

[1] -3.218876

cdf(X, 4)

[1] 1

quantile(X, 0.7)

[1] 0.3600124

### example: allowed types/classes of input arguments

## Single vector (will be coerced to numeric)
Y1  <- rnorm(3, mean = -10)
d1 <- Empirical(Y1)
d1

[1] "Empirical distribution (Min. -10.70, Max.  -9.95, N = 3)"

mean(d1)

[1] -10.28573

## Unnamed list of vectors
Y2 <- list(as.character(rnorm(3, mean = -10)),
           runif(6),
           rpois(4, lambda = 15))
d2 <- Empirical(Y2)
d2

[1] "Empirical distribution (Min. -10.6917, Max.  -8.1584, N = 3)"
[2] "Empirical distribution (Min.   0.2365, Max.   0.8445, N = 6)"
[3] "Empirical distribution (Min.  13.0000, Max.  22.0000, N = 4)"

mean(d2)

[1] -9.7327191  0.5375046 17.5000000

## Named list of vectors
Y3 <- list("Normal"  = as.character(rnorm(3, mean = -10)),
           "Uniform" = runif(6),
           "Poisson" = rpois(4, lambda = 15))
d3 <- Empirical(Y3)
d3

                                                        Normal 
"Empirical distribution (Min. -11.1410, Max.  -8.4768, N = 3)" 
                                                       Uniform 
"Empirical distribution (Min.   0.1372, Max.   0.9940, N = 6)" 
                                                       Poisson 
"Empirical distribution (Min.  16.0000, Max.  22.0000, N = 4)"

mean(d3)

     Normal     Uniform     Poisson 
-10.0322492   0.5316866  18.2500000

## Matrix or data.frame
Y4 <- matrix(rnorm(20), ncol = 5, dimnames = list(sprintf("D_%d", 1:4), sprintf("obs_%d", 1:5)))
d4 <- Empirical(Y4)
d4

                                                         D_1 
"Empirical distribution (Min. -0.2841, Max.  1.0164, N = 5)" 
                                                         D_2 
"Empirical distribution (Min. -0.6239, Max.  1.1759, N = 5)" 
                                                         D_3 
"Empirical distribution (Min. -2.3085, Max.  1.7337, N = 5)" 
                                                         D_4 
"Empirical distribution (Min. -1.5264, Max.  2.4897, N = 5)"

d5 <- Empirical(as.data.frame(Y4))
d5

                                                       obs_1 
"Empirical distribution (Min. -2.3085, Max. -0.2841, N = 4)" 
                                                       obs_2 
"Empirical distribution (Min. -1.1744, Max.  1.1759, N = 4)" 
                                                       obs_3 
"Empirical distribution (Min. -0.1304, Max.  2.4897, N = 4)" 
                                                       obs_4 
"Empirical distribution (Min. -1.5264, Max.  1.0164, N = 4)" 
                                                       obs_5 
"Empirical distribution (Min.  0.1128, Max.  1.6950, N = 4)"