Remove Statistical Outliers
remove_outliers.Rd
Remove statistical outliers from (optionally) paired vectors of numeric
values using a statistical criterion based on median absolute deviation
(\(6 \times mad\)) and a fold change criterion (5x the median).
See "outlier detection" section for more information about
the type =
argument specification.
Arguments
- x
numeric(n)
. A vector of numeric values.- y
Optional. If
NULL
, assume non-paired data and performs outlier analysis on values inx
along. If notNULL
, either a numeric vector or character vector (e.g. class names) ordered in the same order asx
indicating the pairing.- type
character(1)
. Matched. Either "parametric" or "nonparametric" to determine the type of outliers detection implementation.- ...
Additional arguments passed to
get_outliers()
.
Value
A tibble
with columns x
and y
representing each
numeric vector pair with statistical outliers removed.
outlier detection
There are 2 possible methods used to define an outlier measurement and the return value depends on which method is implemented:
The non-parametric case (default): agnostic to the distribution. Outlier measurements are defined as falling outside
mad_crit * mad
from the median and a specified number of fold-changes from the median (i.e.fold_crit
; e.g. \(5x\)).
Note:n_sigma
is ignored.The parametric case: the mean and standard deviation are calculated robustly via
fit_gauss()
. Outliers are defined as measurements falling outside +/-n_sigma
* \(\sigma\) from the the estimated \(\mu\).
Note:mad_crit
andfold_crit
are ignored.
Examples
x <- withr::with_seed(101, rnorm(10, mean = 1000, sd = 2))
x <- c(x, 10000) # create outlier (11L)
x1 <- remove_outliers(x) # 'x' only; no 'y'
x1
#> # A tibble: 10 × 2
#> x y
#> <dbl> <dbl>
#> 1 999. NA
#> 2 1001. NA
#> 3 999. NA
#> 4 1000. NA
#> 5 1001. NA
#> 6 1002. NA
#> 7 1001. NA
#> 8 1000. NA
#> 9 1002. NA
#> 10 1000. NA
y <- head(LETTERS, length(x)) # paired 'x' and 'y'
x2 <- remove_outliers(x, y) # final row removed
x2
#> # A tibble: 10 × 2
#> x y
#> <dbl> <chr>
#> 1 999. A
#> 2 1001. B
#> 3 999. C
#> 4 1000. D
#> 5 1001. E
#> 6 1002. F
#> 7 1001. G
#> 8 1000. H
#> 9 1002. I
#> 10 1000. J