Feature Selection Object Declaration
feature_selection.RdDeclares and generates a feature_selection class object within the
Feature Selection framework. This object acts as the holder of data
(bootstrapped or cross-validation folds), model type, search type,
cost function, and an underlying data structure for use by other functions.
Usage
feature_selection(
data,
candidate_features,
model_type,
search_type,
runs = 1L,
folds = 1L,
cost = c("AUC", "R2", "CCC", "MSE", "sens", "spec"),
bootstrap = FALSE,
stratify = FALSE,
strat_column = NULL,
random_seed = 101L
)
# S3 method for class 'feature_select'
print(x, ...)
is_feature_select(x)
# S3 method for class 'feature_select'
update(object, ...)Arguments
- data
A
data.framecontaining features and clinical data suitable for modeling.- candidate_features
character(n). List of candidate features, i.e. columns names, from the data object.- model_type
An instantiated
model_typeobject, generated via a call to one of themodel_type()functions.- search_type
An instantiated
search_typeobject, generated via a call to one of thesearch_type()functions.- runs
integer(1). How many runs (repeats) to perform.- folds
integer(1). How many fold cross-validation to perform.- cost
character(1). A string to be used in defining the cost function. One of:AUCArea Under the Curve
MSEMean-Squared Error
CCCConcordance Correlation Coefficient
R2R-squared - regression models
sensorspecSensitivity + Specificity
- bootstrap
logical(1). Should data be bootstrapped rather than set up in cross-validation folds? The result is multiple runs (defined by runs) with 1 Fold each. The full data set will be sampled with replacement to generate a training set of equivalent size. The samples not chosen during sampling make up the test set.- stratify
logical(1). Should cross-validation folds be stratified based upon the column specified instrat_column?- strat_column
character(1). Which column to use for stratification of cross-validation. IfNULL(default), column name corresponding to theresponseparameter from the?model_typewill be used.- random_seed
integer(1). Used to control the random number generator for reproducibility.- x, object
A
feature_selectclass object.- ...
Arguments declared for update in
argument = valueformat. Non-declared arguments from the original call are preserved.
Value
A "feature_select" class object; a list of:
- data
The original feature data to use.
- candidate_features
The list of candidate features.
- model_type
A list containing model type variables of the appropriate class for the desired model type.
- search_type
A list containing search type variables of the appropriate class for the desired search type.
- cost
A string of the type of cost function.
- cost_fxn
A list containing cost variables of the appropriate class for the desired object cost function.
- runs
The number of runs.
- folds
The number of folds.
- random_seed
The random seed used
- cross_val
A list containing the training and test indices of the various cross validation folds.
- search_complete
Logical if the object has completed a search
- call
The original matched call.
References
Hastie, Tibshirani, and Friedman. Elements of Statistical Learning: Data Mining, Inference, and Prediction. 2nd Ed. Springer. 2009.
Examples
# Simulated Test Data
data <- wranglr::simdata
# Setup response variable
data$class_response <- factor(data$class_response)
mt <- model_type_lr("class_response")
sm <- search_type_forward_model(15L, display_name = "Forward Algorithm")
ft <- helpr:::get_analytes(data) # select candidate features
fs <- feature_selection(data, candidate_features = ft,
model_type = mt, search_type = sm, cost = "sens",
runs = 5L, folds = 5L)
# S3 Print method
fs
#> ══ Feature Selection Object ═══════════════════════════════════════════
#> ── Dataset Info ───────────────────────────────────────────────────────
#> • Rows 100
#> • Columns 55
#> • FeatureData 40
#> ── Search Optimization Info ───────────────────────────────────────────
#> • No. Candidates '40'
#> • Response Field 'class_response'
#> • Cross Validation Runs '5'
#> • Cross Validation Folds '5'
#> • Stratified Folds 'FALSE'
#> • Model Type 'fs_lr'
#> • Search Type 'fs_forward_model'
#> • Cost Function 'sens'
#> • Random Seed '101'
#> • Display Name 'Forward Algorithm'
#> • Search Complete 'FALSE'
#> ═══════════════════════════════════════════════════════════════════════
# Using the S3 Update method to modify existing `feature_select` object:
# change model type, cost function, and random seed
fs2 <- update(fs, model_type = model_type_nb("class_response"),
cost = "AUC", random_seed = 99L)
fs2
#> ══ Feature Selection Object ═══════════════════════════════════════════
#> ── Dataset Info ───────────────────────────────────────────────────────
#> • Rows 100
#> • Columns 55
#> • FeatureData 40
#> ── Search Optimization Info ───────────────────────────────────────────
#> • No. Candidates '40'
#> • Response Field 'class_response'
#> • Cross Validation Runs '5'
#> • Cross Validation Folds '5'
#> • Stratified Folds 'FALSE'
#> • Model Type 'fs_nb'
#> • Search Type 'fs_forward_model'
#> • Cost Function 'AUC'
#> • Random Seed '99'
#> • Display Name 'Forward Algorithm'
#> • Search Complete 'FALSE'
#> ═══════════════════════════════════════════════════════════════════════
# change number of runs & folds
# requires re-calculation of cross-validation parameters
fs3 <- update(fs, runs = 20L, folds = 10L)
fs3
#> ══ Feature Selection Object ═══════════════════════════════════════════
#> ── Dataset Info ───────────────────────────────────────────────────────
#> • Rows 100
#> • Columns 55
#> • FeatureData 40
#> ── Search Optimization Info ───────────────────────────────────────────
#> • No. Candidates '40'
#> • Response Field 'class_response'
#> • Cross Validation Runs '20'
#> • Cross Validation Folds '10'
#> • Stratified Folds 'FALSE'
#> • Model Type 'fs_lr'
#> • Search Type 'fs_forward_model'
#> • Cost Function 'sens'
#> • Random Seed '101'
#> • Display Name 'Forward Algorithm'
#> • Search Complete 'FALSE'
#> ═══════════════════════════════════════════════════════════════════════