Remove Statistical Outliers
remove_outliers.RdRemove statistical outliers from (optionally) paired vectors of numeric
values using a statistical criterion based on median absolute deviation
(\(6 \times mad\)) and a fold change criterion (5x the median).
See "outlier detection" section for more information about
the type = argument specification.
Arguments
- x
numeric(n). A vector of numeric values.- y
Optional. If
NULL, assume non-paired data and performs outlier analysis on values inxalong. If notNULL, either a numeric vector or character vector (e.g. class names) ordered in the same order asxindicating the pairing.- type
character(1). Matched. Either "parametric" or "nonparametric" to determine the type of outliers detection implementation.- ...
Additional arguments passed to
get_outliers().
Value
A tibble with columns x and y representing each
numeric vector pair with statistical outliers removed.
outlier detection
There are 2 possible methods used to define an outlier measurement and the return value depends on which method is implemented:
The non-parametric case (default): agnostic to the distribution. Outlier measurements are defined as falling outside
mad_crit * madfrom the median and a specified number of fold-changes from the median (i.e.fold_crit; e.g. \(5x\)).
Note:n_sigmais ignored.The parametric case: the mean and standard deviation are calculated robustly via
fit_gauss(). Outliers are defined as measurements falling outside +/-n_sigma* \(\sigma\) from the the estimated \(\mu\).
Note:mad_critandfold_critare ignored.
Examples
x <- withr::with_seed(101, rnorm(10, mean = 1000, sd = 2))
x <- c(x, 10000) # create outlier (11L)
x1 <- remove_outliers(x) # 'x' only; no 'y'
x1
#> # A tibble: 10 × 2
#> x y
#> <dbl> <dbl>
#> 1 999. NA
#> 2 1001. NA
#> 3 999. NA
#> 4 1000. NA
#> 5 1001. NA
#> 6 1002. NA
#> 7 1001. NA
#> 8 1000. NA
#> 9 1002. NA
#> 10 1000. NA
y <- head(LETTERS, length(x)) # paired 'x' and 'y'
x2 <- remove_outliers(x, y) # final row removed
x2
#> # A tibble: 10 × 2
#> x y
#> <dbl> <chr>
#> 1 999. A
#> 2 1001. B
#> 3 999. C
#> 4 1000. D
#> 5 1001. E
#> 6 1002. F
#> 7 1001. G
#> 8 1000. H
#> 9 1002. I
#> 10 1000. J