Statistical learning: performance assessment

resampling methods

the credit data set

want: predict balance as a response

library(janitor)
data(Credit,package = "ISLR2")
credit=Credit %>% clean_names() 
credit %>% slice_sample(n=12) %>% kbl() %>% kable_styling(font_size = 10)

income	limit	rating	cards	age	education	own	student	married	region	balance
51.532	5096	380	2	31	15	No	No	Yes	South	481
148.080	8157	599	2	83	13	No	No	Yes	South	454
14.312	5382	367	1	59	17	No	Yes	No	West	1380
14.956	4640	332	2	33	6	No	No	No	West	681
30.007	6481	462	2	69	9	Yes	No	Yes	South	1093
82.706	7506	536	2	64	13	Yes	No	Yes	West	905
21.551	5380	420	5	51	18	No	No	Yes	West	907
56.256	5521	406	2	72	16	Yes	Yes	Yes	South	1020
62.413	6457	455	2	71	11	Yes	No	Yes	South	762
23.283	5443	407	4	49	13	No	No	Yes	East	912
41.419	2120	184	4	24	11	Yes	Yes	No	South	156
41.192	3673	297	3	54	16	Yes	No	Yes	South	121

validation approaches

train/test

Randomly select a proportion (e.g. .75) of observations for training, the rest is for testing

validation set

Select a proportion (e.g. .8) of observations to train the model, the rest is for validation

cross-validation

Leave-one-out cross-validation: all observations but one are for training, the one left is for validation. The procedure is iterated on the observations, until each observation is used for validation.

K-fold cross-validation: K-1 folds are for training, the last fold is for validation. The procedure is iterated untill each fold is used for validation.

the CV estimate of the evaluation metric is the average over the \(n\) (or, K) iterations

Validation-set to assess model performance

train/test split

set.seed(1234)
credit_small = credit %>% select(balance,sample(names(credit)[-11],3)) 
first_split = initial_split(data = credit_small,prop=.75,strata = balance)
cred_tr = training(first_split)
cred_test = testing(first_split)

analysis/ validate split

val_split = validation_split(data = cred_tr, prop=.8,strata = balance)

Validation-set to assess model performance

pre-processing

cred_an = analysis(val_split %>% pull(splits) %>% .[[1]])
cred_prep = recipe(balance~., data = cred_an)

model-spec

cred_mod = linear_reg(mode="regression", engine="lm")

workflow setup

cred_wflow = workflow() %>% 
  add_recipe(cred_prep) %>% 
  add_model(cred_mod)

model fit

cred_fit = cred_wflow %>% 
  fit(cred_an)

assessment set predictions

cred_as = assessment(val_split %>%  pull(splits) %>% .[[1]])
cred_preds = cred_fit %>% augment(cred_as) %>% select(balance,.pred)

Validation-set to assess model performance

assessment metric: RMSE

cred_preds %>% rmse(truth=balance, estimate=.pred) %>% 
  kbl()

.metric	.estimator	.estimate
rmse	standard	455.9919

assessment RMSE: a shortcut

cred_wflow %>% 
  fit_resamples(val_split) %>% 
  collect_metrics() %>% filter(.metric == "rmse") %>% 
  kbl()

.metric	.estimator	mean	n	std_err	.config
rmse	standard	455.9919	1	NA	Preprocessor1_Model1

v-fold cross validation to assess model performance

data split: folds

cred_folds = vfold_cv(cred_tr,v=5,strata=balance)

pre-processing

cred_prep = recipe(balance~., data = cred_tr)

model-spec

cred_mod = linear_reg(mode="regression", engine="lm")

workflow setup

cred_wflow = workflow() %>% 
  add_recipe(cred_prep) %>% 
  add_model(cred_mod)

v-fold cross validation to assess model performance

model fit

cred_fit_vfolds = cred_wflow %>% 
  fit_resamples(cred_folds)

# Resampling results
# 5-fold cross-validation using stratification 
# A tibble: 5 × 4
  splits           id    .metrics         .notes          
  <list>           <chr> <list>           <list>          
1 <split [239/60]> Fold1 <tibble [2 × 4]> <tibble [0 × 3]>
2 <split [239/60]> Fold2 <tibble [2 × 4]> <tibble [0 × 3]>
3 <split [239/60]> Fold3 <tibble [2 × 4]> <tibble [0 × 3]>
4 <split [239/60]> Fold4 <tibble [2 × 4]> <tibble [0 × 3]>
5 <split [240/59]> Fold5 <tibble [2 × 4]> <tibble [0 × 3]>

cross-validated RMSE

cred_fit_vfolds %>% 
  collect_metrics() %>% filter(.metric == "rmse") %>% 
  kbl()

.metric	.estimator	mean	n	std_err	.config
rmse	standard	456.6622	5	16.18599	Preprocessor1_Model1

Leave One Out cross validation to assess model performance

Note: Leave one out cross-validation is somewhat deprecated, one can still use v_fold_cv and set as many folds as the as the rows in the training set

cred_loo = vfold_cv(cred_tr,v=nrow(cred_tr))

Leave One Out cross validation to assess model performance

model fit

cred_fit_loo = cred_wflow %>%
  fit_resamples(cred_loo)

cross-validated RMSE

cred_fit_loo %>%
  collect_metrics() %>% filter(.metric == "rmse") %>%
  kbl()

.metric	.estimator	mean	n	std_err	.config
rmse	standard	395.059	299	13.91894	Preprocessor1_Model1

hyperparameter tuning

tiny example on polynomial regression: balance vs rating

Consider a tiny example of a tuning process:

credit_tiny = credit %>% select(rating, balance)
credit_split = initial_split(credit_tiny,prop=3/4)
credit_tiny %>% mutate(train_test = replace(rep("test", n()),credit_split$in_id,"train")
                       ) %>% 
  ggplot(aes(x = rating,y = balance,color=train_test)) + geom_point() + theme_minimal()

tiny example on polynomial regression: balance vs rating

define the CV-folds on the training set

tiny_cred_folds = vfold_cv(training(credit_split),5)

the degree of the polynomial is set at a recipe level. Since it is an hyperparameter, we set it to tune(), a place holder

tiny_rec = recipe(formula = balance~rating, data = training(credit_split)) %>% 
  step_poly(rating, degree=tune())

The model specification does not change, so cred_mod can still be used. The workflow is then

tiny_wflow = workflow() %>% add_recipe(tiny_rec) %>% add_model(cred_mod)

It now takes to specify the hyperparameter values grid, and use the function tune_grid() that does all the job

hparm_grid = tibble(degree=1:10)
tiny_cred_tuning = tiny_wflow %>% 
  tune_grid(resamples = tiny_cred_folds,
            grid = hparm_grid,
            control = control_grid(save_pred = TRUE)
  )

tiny example on polynomial regression: balance vs rating

Check the results

tiny_cred_tuning %>%  collect_metrics() %>% filter(.metric=="rmse") %>% slice_min(mean) %>% 
  kbl()

degree	.metric	.estimator	mean	n	std_err	.config
7	rmse	standard	219.9721	5	16.5709	Preprocessor07_Model1

autoplot(tiny_cred_tuning,metric = "rmse")+theme_minimal()

tiny example on polynomial regression: balance vs rating

Select the best performing model

tiny_cred_final_mod = tiny_cred_tuning %>% select_best("rmse")

Finalize the workflow (meaning: tell the workflow to pick the best model)

final_wflow=tiny_wflow %>% finalize_workflow(tiny_cred_final_mod)

Final fit and evaluation of the model: fit on the training, evaluate on the test. Pick credit_split which is the result of the first split initial_split()

tiny_final_fit = final_wflow %>% 
  last_fit(credit_split)

tiny_final_fit %>% collect_metrics("rmse") %>% kbl()

.metric	.estimator	.estimate	.config
rmse	standard	211.7520791	Preprocessor1_Model1
rsq	standard	0.7413238	Preprocessor1_Model1