Statistical learning: performance assessment

resampling methods

the credit data set

want: predict balance as a response

library(janitor)
data(Credit,package = "ISLR2")
credit=Credit %>% clean_names() 
credit %>% slice_sample(n=12) %>% kbl() %>% kable_styling(font_size = 10)
income limit rating cards age education own student married region balance
51.532 5096 380 2 31 15 No No Yes South 481
148.080 8157 599 2 83 13 No No Yes South 454
14.312 5382 367 1 59 17 No Yes No West 1380
14.956 4640 332 2 33 6 No No No West 681
30.007 6481 462 2 69 9 Yes No Yes South 1093
82.706 7506 536 2 64 13 Yes No Yes West 905
21.551 5380 420 5 51 18 No No Yes West 907
56.256 5521 406 2 72 16 Yes Yes Yes South 1020
62.413 6457 455 2 71 11 Yes No Yes South 762
23.283 5443 407 4 49 13 No No Yes East 912
41.419 2120 184 4 24 11 Yes Yes No South 156
41.192 3673 297 3 54 16 Yes No Yes South 121

validation approaches

train/test

Randomly select a proportion (e.g. .75) of observations for training, the rest is for testing

validation set

Select a proportion (e.g. .8) of observations to train the model, the rest is for validation

cross-validation

Leave-one-out cross-validation: all observations but one are for training, the one left is for validation. The procedure is iterated on the observations, until each observation is used for validation.

K-fold cross-validation: K-1 folds are for training, the last fold is for validation. The procedure is iterated untill each fold is used for validation.

the CV estimate of the evaluation metric is the average over the \(n\) (or, K) iterations

Validation-set to assess model performance

train/test split

set.seed(1234)
credit_small = credit %>% select(balance,sample(names(credit)[-11],3)) 
first_split = initial_split(data = credit_small,prop=.75,strata = balance)
cred_tr = training(first_split)
cred_test = testing(first_split)

analysis/ validate split

val_split = validation_split(data = cred_tr, prop=.8,strata = balance)

Validation-set to assess model performance

pre-processing

cred_an = analysis(val_split %>% pull(splits) %>% .[[1]])
cred_prep = recipe(balance~., data = cred_an)

model-spec

cred_mod = linear_reg(mode="regression", engine="lm")

workflow setup

cred_wflow = workflow() %>% 
  add_recipe(cred_prep) %>% 
  add_model(cred_mod)

model fit

cred_fit = cred_wflow %>% 
  fit(cred_an)

assessment set predictions

cred_as = assessment(val_split %>%  pull(splits) %>% .[[1]])
cred_preds = cred_fit %>% augment(cred_as) %>% select(balance,.pred)

Validation-set to assess model performance

assessment metric: RMSE

cred_preds %>% rmse(truth=balance, estimate=.pred) %>% 
  kbl()
.metric .estimator .estimate
rmse standard 455.9919

assessment RMSE: a shortcut

cred_wflow %>% 
  fit_resamples(val_split) %>% 
  collect_metrics() %>% filter(.metric == "rmse") %>% 
  kbl() 
.metric .estimator mean n std_err .config
rmse standard 455.9919 1 NA Preprocessor1_Model1

v-fold cross validation to assess model performance

data split: folds

cred_folds = vfold_cv(cred_tr,v=5,strata=balance)

pre-processing

cred_prep = recipe(balance~., data = cred_tr)

model-spec

cred_mod = linear_reg(mode="regression", engine="lm")

workflow setup

cred_wflow = workflow() %>% 
  add_recipe(cred_prep) %>% 
  add_model(cred_mod)

v-fold cross validation to assess model performance

model fit

cred_fit_vfolds = cred_wflow %>% 
  fit_resamples(cred_folds)
# Resampling results
# 5-fold cross-validation using stratification 
# A tibble: 5 × 4
  splits           id    .metrics         .notes          
  <list>           <chr> <list>           <list>          
1 <split [239/60]> Fold1 <tibble [2 × 4]> <tibble [0 × 3]>
2 <split [239/60]> Fold2 <tibble [2 × 4]> <tibble [0 × 3]>
3 <split [239/60]> Fold3 <tibble [2 × 4]> <tibble [0 × 3]>
4 <split [239/60]> Fold4 <tibble [2 × 4]> <tibble [0 × 3]>
5 <split [240/59]> Fold5 <tibble [2 × 4]> <tibble [0 × 3]>

cross-validated RMSE

cred_fit_vfolds %>% 
  collect_metrics() %>% filter(.metric == "rmse") %>% 
  kbl() 
.metric .estimator mean n std_err .config
rmse standard 456.6622 5 16.18599 Preprocessor1_Model1

Leave One Out cross validation to assess model performance

Note: Leave one out cross-validation is somewhat deprecated, one can still use v_fold_cv and set as many folds as the as the rows in the training set

cred_loo = vfold_cv(cred_tr,v=nrow(cred_tr))

Leave One Out cross validation to assess model performance

model fit

cred_fit_loo = cred_wflow %>%
  fit_resamples(cred_loo)

cross-validated RMSE

cred_fit_loo %>%
  collect_metrics() %>% filter(.metric == "rmse") %>%
  kbl() 
.metric .estimator mean n std_err .config
rmse standard 395.059 299 13.91894 Preprocessor1_Model1

hyperparameter tuning

tiny example on polynomial regression: balance vs rating

Consider a tiny example of a tuning process:

credit_tiny = credit %>% select(rating, balance)
credit_split = initial_split(credit_tiny,prop=3/4)
credit_tiny %>% mutate(train_test = replace(rep("test", n()),credit_split$in_id,"train")
                       ) %>% 
  ggplot(aes(x = rating,y = balance,color=train_test)) + geom_point() + theme_minimal()

tiny example on polynomial regression: balance vs rating

define the CV-folds on the training set

tiny_cred_folds = vfold_cv(training(credit_split),5)

the degree of the polynomial is set at a recipe level. Since it is an hyperparameter, we set it to tune(), a place holder

tiny_rec = recipe(formula = balance~rating, data = training(credit_split)) %>% 
  step_poly(rating, degree=tune())

The model specification does not change, so cred_mod can still be used. The workflow is then

tiny_wflow = workflow() %>% add_recipe(tiny_rec) %>% add_model(cred_mod)

It now takes to specify the hyperparameter values grid, and use the function tune_grid() that does all the job

hparm_grid = tibble(degree=1:10)
tiny_cred_tuning = tiny_wflow %>% 
  tune_grid(resamples = tiny_cred_folds,
            grid = hparm_grid,
            control = control_grid(save_pred = TRUE)
  )

tiny example on polynomial regression: balance vs rating

Check the results

tiny_cred_tuning %>%  collect_metrics() %>% filter(.metric=="rmse") %>% slice_min(mean) %>% 
  kbl() 
degree .metric .estimator mean n std_err .config
7 rmse standard 219.9721 5 16.5709 Preprocessor07_Model1
autoplot(tiny_cred_tuning,metric = "rmse")+theme_minimal()

tiny example on polynomial regression: balance vs rating

Select the best performing model

tiny_cred_final_mod = tiny_cred_tuning %>% select_best("rmse")

Finalize the workflow (meaning: tell the workflow to pick the best model)

final_wflow=tiny_wflow %>% finalize_workflow(tiny_cred_final_mod)

Final fit and evaluation of the model: fit on the training, evaluate on the test. Pick credit_split which is the result of the first split initial_split()

tiny_final_fit = final_wflow %>% 
  last_fit(credit_split)

tiny_final_fit %>% collect_metrics("rmse") %>% kbl() 
.metric .estimator .estimate .config
rmse standard 211.7520791 Preprocessor1_Model1
rsq standard 0.7413238 Preprocessor1_Model1