Skip to content

These step-wise search methods can be used to identify a locally optimal model complexity through greedy search. The methods either build up (forward) or break down (backward) a model one covariate at a time based upon the results of all "Runs" and "Folds" of cross-validation sets, i.e. for 5 runs of 5-fold cross-validation, 25 evaluations are made to determine which covariate yields the best average performance (for a given cost function). See Details for more information on options.

Usage

Search(x, num_cores)

Arguments

x

A feature_select class object from a call call to feature_selection().

num_cores

integer(1). How many cores to use during the search. Defaults to 1L, which does not use parallel processing. Values > 1 only available in Linux systems. Note that parallel processing is implemented on the runs(!), so choose the cores appropriately.

Value

A "feature_select" class object; a list of:

data

The original feature data to use.

candidate_features

The list of candidate features.

model_type

A list containing model type variables of the appropriate class for the desired model type.

search_type

A list containing search type variables of the appropriate class for the desired search type.

cost

A string of the type of cost function.

cost_fxn

A list containing cost variables of the appropriate class for the desired object cost function.

runs

The number of runs.

folds

The number of folds.

random_seed

The random seed used

cross_val

A list containing the training and test indices of the various cross validation folds.

search_complete

Logical if the object has completed a search

call

The original matched call.

Details

There are currently 2 search options, all of which are "greedy" algorithms:

Forward Model Search:

The covariate found in the first step carries through to all other steps. Likewise, the second covariate found (in combination with the first) also carries through. The results is a single model determined to be locally optimal based upon the performances across all runs and folds.

Backward Model Search:

The covariate removed in the first step is eliminated through all other steps. The result is a single model determined to be locally optimal based upon the performances across all runs and folds.

Silence notches

For the plot() routine, Notch went outside hinges message is often triggered by ggplot2. This can be silenced by setting global options:

options(rlib_message_verbosity = "quiet")

Author

Stu Field, Kirk DeLisle

Examples

data <- wranglr::simdata

# Setup response variable
data$class_response <- as.factor(data$class_response)
mt  <- model_type_lr("class_response")
sm  <- search_type_forward_model("Forward Selection Algorithm", 10L)
ft  <- head(helpr:::get_analytes(data))  # select candidate features
mcp <- feature_selection(data,
                         candidate_features = ft,
                         model_type = mt,
                         search_type = sm,
                         cost = "AUC",
                         runs = 4L, folds = 3L,
                         random_seed = 101L)

fs <- Search(mcp)
#>  Starting the Feature Selection algorithm ...
#> ── Using `Forward-Stepwise` model search ──────────────────────────────
#>  Step 1 of 6
#>  Step 2 of 6
#>  Step 3 of 6
#>  Step 4 of 6
#>  Step 5 of 6
#>  Step 6 of 6
fs
#> ══ Feature Selection Object ═══════════════════════════════════════════
#> ── Dataset Info ───────────────────────────────────────────────────────
#>  Rows                      100
#>  Columns                   55
#>  FeatureData               6
#> ── Search Optimization Info ───────────────────────────────────────────
#>  No. Candidates            '6'
#>  Response Field            'class_response'
#>  Cross Validation Runs     '4'
#>  Cross Validation Folds    '3'
#>  Stratified Folds          'FALSE'
#>  Model Type                'fs_lr'
#>  Search Type               'fs_forward_model'
#>  Cost Function             'AUC'
#>  Random Seed               '101'
#>  Display Name              'Forward Selection Algorithm'
#>  Search Complete           'TRUE'
#> ═══════════════════════════════════════════════════════════════════════

plot(fs)
#> Notch went outside hinges
#>  Do you want `notch = FALSE`?
#> Notch went outside hinges
#>  Do you want `notch = FALSE`?
#> Notch went outside hinges
#>  Do you want `notch = FALSE`?
#> Notch went outside hinges
#>  Do you want `notch = FALSE`?
#> Notch went outside hinges
#>  Do you want `notch = FALSE`?


# Using parallel processing:
#   should be ~4x faster than above
if (FALSE) { # \dontrun{
  fs <- Search(mcp, num_cores = 4L)
} # }