An empirical distribution consists of a series of N observations out of a typically unknown distribution, i.e., a random sample 'X'.

Draws n random values from the empirical ensemble with replacement.

Please see the documentation of [Empirical()] for some properties of the empircal ensemble distribution, as well as extensive examples showing to how calculate p-values and confidence intervals.

Please see the documentation of [Empirical()] for some properties of the Empirical distribution, as well as extensive examples showing to how calculate p-values and confidence intervals. `quantile()`

TODO(RETO): Check description

Empirical(x)

pempirical(q, y, lower.tail = TRUE, log.p = FALSE, na.rm = TRUE)

dempirical(x, y, log = FALSE, method = "hist", ...)

qempirical(p, y, lower.tail = TRUE, log.p = FALSE, na.rm = TRUE, ...)

rempirical(n, y, na.rm = TRUE)

# S3 method for Empirical
mean(x, ...)

# S3 method for Empirical
variance(x, ...)

# S3 method for Empirical
skewness(x, type = 1L, ...)

# S3 method for Empirical
kurtosis(x, type = 3L, ...)

# S3 method for Empirical
random(x, n = 1L, drop = TRUE, ...)

# S3 method for Empirical
pdf(d, x, drop = TRUE, elementwise = NULL, ...)

# S3 method for Empirical
log_pdf(d, x, drop = TRUE, elementwise = NULL, ...)

# S3 method for Empirical
cdf(d, x, drop = TRUE, elementwise = NULL, ...)

# S3 method for Empirical
quantile(x, probs, drop = TRUE, elementwise = NULL, ...)

# S3 method for Empirical
support(d, drop = TRUE, ...)

Arguments

x

A vector of elements whose cumulative probabilities you would like to determine given the distribution `d`.

q

vector of quantiles.

y

vector of observations of the empirical distribution with two or more non-missing finite values.

lower.tail

logical; if TRUE (default), probabilities are P[X <= x] otherwise, P[X > x]. or "density".

na.rm

logical evaluating to TRUE or FALSE indicating whether NA values should be stripped before the computation proceeds.

log, log.p

logical; if TRUE, probabilities p are given as log(p).

method

character; the method to calculate the empirical density. Either "hist" (default)

...

Currently not used.

p

vector of probabilities.

n

The number of samples to draw. Defaults to `1L`.

type

integer between 1L and 3L (default) selecting one of three algorithms. See Details for more information.

drop

logical. Should the result be simplified to a vector if possible?

d

An `Empirical` object created by a call to [Empirical()].

elementwise

logical. Should each distribution in x be evaluated at all elements of probs (elementwise = FALSE, yielding a matrix)? Or, if x and probs have the same length, should the evaluation be done element by element (elementwise = TRUE, yielding a vector)? The default of NULL means that elementwise = TRUE is used if the lengths match and otherwise elementwise = FALSE is used.

probs

A vector of probabilities.

Value

An `Empirical` object.

In case of a single distribution object or `n = 1`, either a numeric vector of length `n` (if `drop = TRUE`, default) or a `matrix` with `n` columns (if `drop = FALSE`).

In case of a single distribution object, either a numeric vector of length `probs` (if `drop = TRUE`, default) or a `matrix` with `length(x)` columns (if `drop = FALSE`). In case of a vectorized distribution object, a matrix with `length(x)` columns containing all possible combinations.

In case of a single distribution object, either a numeric vector of length `probs` (if `drop = TRUE`, default) or a `matrix` with `length(x)` columns (if `drop = FALSE`). In case of a vectorized distribution object, a matrix with `length(x)` columns containing all possible combinations.

In case of a single distribution object, either a numeric vector of length `probs` (if `drop = TRUE`, default) or a `matrix` with `length(probs)` columns (if `drop = FALSE`). In case of a vectorized distribution object, a matrix with `length(probs)` columns containing all possible combinations.

In case of a single distribution object, a numeric vector of length 2 with the minimum and maximum value of the support (if `drop = TRUE`, default) or a `matrix` with 2 columns. In case of a vectorized distribution object, a matrix with 2 columns containing all minima and maxima.

Details

The creation function [Empirical()] allows for a variety of different objects as main input x.

* Vector: Assumes that the vector contains a series of observations from one empirical distribution.

* List (named or unnamed) of vectors: Each element in the list describes one empirical distribution defined by the numeric values in each of the vectors.

* Matrix or data.frame: Each row corresponds to one empirical distribution, whilst the columns contain the individual observations.

Missing values are allowed, however, each distribution requires at least two finite observations (-Inf/Inf is replaced by NA).

**Support**: \(R\), the set of all real numbers

**Mean**: $$\bar{x} = \frac{1}{N} \sum_{i=1}^{N} x_i$$

**Variance**: $$\frac{1}{N - 1} \sum_{i=1}^{N} (x_i - \bar{x})$$

**Skewness**:

\(S_1 = \sqrt{N} \frac{\sum_{i=1}^N (x_i - \bar{x})^3}{\sqrt{\big(\sum_{i=1}^N (x_i - \bar{x})^2\big)^3}}\)

\(S_2 = \frac{\sqrt{N * (N - 1)}}{(N - 2)} S_1\) (only defined for \(N > 2\))

\(S_3 = \sqrt{(1 - \frac{1}{N})^3} * S_1\) (default)

For more details about the different types of sample skewness see Joanes and Gill (1998).

**Kurtosis**:

\(K_1 = N * \frac{\sum_{i=1}^N (x_i - \bar{x})^4}{\big(\sum_{i=1}^N (x_i - \bar{x})^2\big)^2} - 3\)

\(K_2 = \frac{(N + 1) * K_1 + 6) * (N - 1)}{(N - 2) * (N - 3)}\) (only defined for \(N > 2\))

\(K_3 = \big(1 - \frac{1}{N}\big)^2 * (K_1 + 3) - 3\) (default)

For more details about the different types of sample kurtosis see Joanes and Gill (1998).

**TODO(RETO)**: Add empirical distribution function information (step-function 1/N)

**Probability density function (p.d.f)**:

This function returns the same values that you get from a Z-table. Note `quantile()` is the inverse of `cdf()`. Please see the documentation of [Empirical()] for some properties of the Empirical distribution, as well as extensive examples showing to how calculate p-values and confidence intervals.

References

Joanes DN and Gill CA (1998). “Comparing Measures of Sample Skewness and Kurtosis.” Journal of the Royal Statistical Society D, 47(1), 183--189. doi:10.1111/1467-9884.00122

Examples


require("distributions3")
#> Loading required package: distributions3
#> 
#> Attaching package: ‘distributions3’
#> The following object is masked from ‘package:stats’:
#> 
#>     Gamma
#> The following object is masked from ‘package:grDevices’:
#> 
#>     pdf
set.seed(28)

X <- Empirical(rnorm(50))
X
#> [1] "Empirical distribution (Min. -2.100, Max.  2.187, N = 50)"

mean(X)
#> [1] -0.09838857
variance(X)
#> [1] 1.076242
skewness(X)
#> [1] 0.1027858
kurtosis(X)
#> [1] -0.5339262

random(X, 10)
#>  [1]  0.62280108 -1.66020539 -0.06429479 -0.61645815  0.14298835 -1.85883315
#>  [7] -0.82054223 -1.66020539 -0.88294400 -0.43544484

pdf(X, 2)
#> [1] 0.04
log_pdf(X, 2)
#> [1] -3.218876

cdf(X, 4)
#> [1] 1
quantile(X, 0.7)
#> [1] 0.3600124

### example: allowed types/classes of input arguments

## Single vector (will be coerced to numeric)
Y1  <- rnorm(3, mean = -10)
d1 <- Empirical(Y1)
d1
#> [1] "Empirical distribution (Min. -10.70, Max.  -9.95, N = 3)"
mean(d1)
#> [1] -10.28573

## Unnamed list of vectors
Y2 <- list(as.character(rnorm(3, mean = -10)),
           runif(6),
           rpois(4, lambda = 15))
d2 <- Empirical(Y2)
d2
#> [1] "Empirical distribution (Min. -10.6917, Max.  -8.1584, N = 3)"
#> [2] "Empirical distribution (Min.   0.2365, Max.   0.8445, N = 6)"
#> [3] "Empirical distribution (Min.  13.0000, Max.  22.0000, N = 4)"
mean(d2)
#> [1] -9.7327191  0.5375046 17.5000000

## Named list of vectors
Y3 <- list("Normal"  = as.character(rnorm(3, mean = -10)),
           "Uniform" = runif(6),
           "Poisson" = rpois(4, lambda = 15))
d3 <- Empirical(Y3)
d3
#>                                                         Normal 
#> "Empirical distribution (Min. -11.1410, Max.  -8.4768, N = 3)" 
#>                                                        Uniform 
#> "Empirical distribution (Min.   0.1372, Max.   0.9940, N = 6)" 
#>                                                        Poisson 
#> "Empirical distribution (Min.  16.0000, Max.  22.0000, N = 4)" 
mean(d3)
#>      Normal     Uniform     Poisson 
#> -10.0322492   0.5316866  18.2500000 

## Matrix or data.frame
Y4 <- matrix(rnorm(20), ncol = 5, dimnames = list(sprintf("D_%d", 1:4), sprintf("obs_%d", 1:5)))
d4 <- Empirical(Y4)
d4
#>                                                          D_1 
#> "Empirical distribution (Min. -0.2841, Max.  1.0164, N = 5)" 
#>                                                          D_2 
#> "Empirical distribution (Min. -0.6239, Max.  1.1759, N = 5)" 
#>                                                          D_3 
#> "Empirical distribution (Min. -2.3085, Max.  1.7337, N = 5)" 
#>                                                          D_4 
#> "Empirical distribution (Min. -1.5264, Max.  2.4897, N = 5)" 
d5 <- Empirical(as.data.frame(Y4))
d5
#>                                                        obs_1 
#> "Empirical distribution (Min. -2.3085, Max. -0.2841, N = 4)" 
#>                                                        obs_2 
#> "Empirical distribution (Min. -1.1744, Max.  1.1759, N = 4)" 
#>                                                        obs_3 
#> "Empirical distribution (Min. -0.1304, Max.  2.4897, N = 4)" 
#>                                                        obs_4 
#> "Empirical distribution (Min. -1.5264, Max.  1.0164, N = 4)" 
#>                                                        obs_5 
#> "Empirical distribution (Min.  0.1128, Max.  1.6950, N = 4)"