Skip to content

Calculates the indices of a vector of values that exceed a specified statistical outlier criterion. This criterion is defined by differently depending on the type of outlier detection implemented (see Details).

Usage

get_outliers(
  x,
  n_sigma = 3,
  mad_crit = 6,
  fold_crit = 5,
  type = c("nonparametric", "parametric")
)

Arguments

x

`numeric(n). A vector of values to evaluate.

n_sigma

numeric(1). The the number of standard deviations from the mean a n.sigma threshold for outliers. Ignored if type = "nonparametric".

mad_crit

The median absolute deviation ("MAD") criterion to use. Ignored if type = "parametric". Defaults to (6 * mad).

fold_crit

The fold-change criterion to evaluate. Ignored if type = "parametric". Defaults to 5x.

type

character(1). Matched. Either "parametric" or "nonparametric" to determine the type of outliers detection implementation.

Value

If "nonparametric": an integer vector of indices corresponding to detected outliers.

If "parametric": an integer vector of indices with these additional attributes:

mu

the robustly fit mean (\(\mu\)).

sigma

the robustly fit standard deviation (\(\sigma\)).

crit

the 2 critical values beyond which a value is considered an outlier.

outlier detection

There are 2 possible methods used to define an outlier measurement and the return value depends on which method is implemented:

  1. The non-parametric case (default): agnostic to the distribution. Outlier measurements are defined as falling outside mad.crit * mad from the median and a specified number of fold-changes from the median (i.e. fold.crit; e.g. 5x).
    Note: n.sigma is ignored.

  2. The parametric case: the mean and standard deviation are calculated robustly via fit_gauss(). Outliers are defined as measurements falling outside +/- n.sigma*\(\sigma\) from the the estimated \(\mu\).
    Note: mad.crit and fold.crit are ignored.

See also

Author

Stu Field

Examples

withr::with_seed(101, {
  x <- rnorm(26, 15, 2)         # Gaussian
  x <- c(2, 2.5, x, 25, 25.9)   # add 4 outliers (2hi, 2lo)
})
get_outliers(x)                # non-parametric (default)
#> [1] 1 2
get_outliers(x, type = "para") # parametric
#> [1]  1  2 29 30
#> attr(,"mu")
#> [1] 14.66106
#> attr(,"sigma")
#> [1] 2.237713
#> attr(,"crit")
#> [1]  7.947922 21.374201