Skip to content

Declares and generates a feature_selection class object within the Feature Selection framework. This object acts as the holder of data (bootstrapped or cross-validation folds), model type, search type, cost function, and an underlying data structure for use by other functions.

Usage

feature_selection(
  data,
  candidate_features,
  model_type,
  search_type,
  runs = 1L,
  folds = 1L,
  cost = c("AUC", "R2", "CCC", "MSE", "sens", "spec"),
  bootstrap = FALSE,
  stratify = FALSE,
  strat_column = NULL,
  random_seed = 101L
)

# S3 method for class 'feature_select'
print(x, ...)

is_feature_select(x)

# S3 method for class 'feature_select'
update(object, ...)

Arguments

data

A data.frame containing features and clinical data suitable for modeling.

candidate_features

character(n). List of candidate features, i.e. columns names, from the data object.

model_type

An instantiated model_type object, generated via a call to one of the model_type() functions.

search_type

An instantiated search_type object, generated via a call to one of the search_type() functions.

runs

integer(1). How many runs (repeats) to perform.

folds

integer(1). How many fold cross-validation to perform.

cost

character(1). A string to be used in defining the cost function. One of:

AUC

Area Under the Curve

MSE

Mean-Squared Error

CCC

Concordance Correlation Coefficient

R2

R-squared - regression models

sens or spec

Sensitivity + Specificity

bootstrap

logical(1). Should data be bootstrapped rather than set up in cross-validation folds? The result is multiple runs (defined by runs) with 1 Fold each. The full data set will be sampled with replacement to generate a training set of equivalent size. The samples not chosen during sampling make up the test set.

stratify

logical(1). Should cross-validation folds be stratified based upon the column specified in strat_column?

strat_column

character(1). Which column to use for stratification of cross-validation. If NULL (default), column name corresponding to the response parameter from the ?model_type will be used.

random_seed

integer(1). Used to control the random number generator for reproducibility.

x, object

A feature_select class object.

...

Arguments declared for update in argument = value format. Non-declared arguments from the original call are preserved.

Value

A "feature_select" class object; a list of:

data

The original feature data to use.

candidate_features

The list of candidate features.

model_type

A list containing model type variables of the appropriate class for the desired model type.

search_type

A list containing search type variables of the appropriate class for the desired search type.

cost

A string of the type of cost function.

cost_fxn

A list containing cost variables of the appropriate class for the desired object cost function.

runs

The number of runs.

folds

The number of folds.

random_seed

The random seed used

cross_val

A list containing the training and test indices of the various cross validation folds.

search_complete

Logical if the object has completed a search

call

The original matched call.

Functions

  • is_feature_select(): Check if a valid feature_select class object.

References

Hastie, Tibshirani, and Friedman. Elements of Statistical Learning: Data Mining, Inference, and Prediction. 2nd Ed. Springer. 2009.

See also

Author

Stu Field, Kirk DeLisle

Examples

# Simulated Test Data
data <- wranglr::simdata

# Setup response variable
data$class_response <- factor(data$class_response)

mt <- model_type_lr("class_response")
sm <- search_type_forward_model(15L, display_name = "Forward Algorithm")
ft <- helpr:::get_analytes(data)   # select candidate features
fs <- feature_selection(data, candidate_features = ft,
                        model_type = mt, search_type = sm, cost = "sens",
                        runs = 5L, folds = 5L)
# S3 Print method
fs
#> ══ Feature Selection Object ═══════════════════════════════════════════
#> ── Dataset Info ───────────────────────────────────────────────────────
#>  Rows                      100
#>  Columns                   55
#>  FeatureData               40
#> ── Search Optimization Info ───────────────────────────────────────────
#>  No. Candidates            '40'
#>  Response Field            'class_response'
#>  Cross Validation Runs     '5'
#>  Cross Validation Folds    '5'
#>  Stratified Folds          'FALSE'
#>  Model Type                'fs_lr'
#>  Search Type               'fs_forward_model'
#>  Cost Function             'sens'
#>  Random Seed               '101'
#>  Display Name              'Forward Algorithm'
#>  Search Complete           'FALSE'
#> ═══════════════════════════════════════════════════════════════════════

# Using the S3 Update method to modify existing `feature_select` object:
#   change model type, cost function, and random seed
fs2 <- update(fs, model_type = model_type_nb("class_response"),
              cost = "AUC", random_seed = 99L)
fs2
#> ══ Feature Selection Object ═══════════════════════════════════════════
#> ── Dataset Info ───────────────────────────────────────────────────────
#>  Rows                      100
#>  Columns                   55
#>  FeatureData               40
#> ── Search Optimization Info ───────────────────────────────────────────
#>  No. Candidates            '40'
#>  Response Field            'class_response'
#>  Cross Validation Runs     '5'
#>  Cross Validation Folds    '5'
#>  Stratified Folds          'FALSE'
#>  Model Type                'fs_nb'
#>  Search Type               'fs_forward_model'
#>  Cost Function             'AUC'
#>  Random Seed               '99'
#>  Display Name              'Forward Algorithm'
#>  Search Complete           'FALSE'
#> ═══════════════════════════════════════════════════════════════════════

# change number of runs & folds
#   requires re-calculation of cross-validation parameters
fs3 <- update(fs, runs = 20L, folds = 10L)
fs3
#> ══ Feature Selection Object ═══════════════════════════════════════════
#> ── Dataset Info ───────────────────────────────────────────────────────
#>  Rows                      100
#>  Columns                   55
#>  FeatureData               40
#> ── Search Optimization Info ───────────────────────────────────────────
#>  No. Candidates            '40'
#>  Response Field            'class_response'
#>  Cross Validation Runs     '20'
#>  Cross Validation Folds    '10'
#>  Stratified Folds          'FALSE'
#>  Model Type                'fs_lr'
#>  Search Type               'fs_forward_model'
#>  Cost Function             'sens'
#>  Random Seed               '101'
#>  Display Name              'Forward Algorithm'
#>  Search Complete           'FALSE'
#> ═══════════════════════════════════════════════════════════════════════