Remove Statistical Outliers

Remove statistical outliers from (optionally) paired vectors of numeric values using a statistical criterion based on median absolute deviation (\(6 \times mad\)) and a fold change criterion (5x the median). See "outlier detection" section for more information about the type = argument specification.

Usage

remove_outliers(x, y = NULL, type = "nonparametric", ...)

Arguments

x: numeric(n). A vector of numeric values.
y: Optional. If NULL, assume non-paired data and performs outlier analysis on values in x along. If not NULL, either a numeric vector or character vector (e.g. class names) ordered in the same order as x indicating the pairing.
type: character(1). Matched. Either "parametric" or "nonparametric" to determine the type of outliers detection implementation.
...: Additional arguments passed to get_outliers().

Value

A tibble with columns x and y representing each numeric vector pair with statistical outliers removed.

outlier detection

There are 2 possible methods used to define an outlier measurement and the return value depends on which method is implemented:

The non-parametric case (default): agnostic to the distribution. Outlier measurements are defined as falling outside mad_crit * mad from the median and a specified number of fold-changes from the median (i.e. fold_crit; e.g. \(5x\)).
Note: n_sigma is ignored.
The parametric case: the mean and standard deviation are calculated robustly via fit_gauss(). Outliers are defined as measurements falling outside +/- n_sigma * \(\sigma\) from the the estimated \(\mu\).
Note: mad_crit and fold_crit are ignored.

Author

Stu Field

Examples