pre-processing
split your observations in training, validation (or, cross-validate) and test
transform the predictors properly (features engineering)
model-spec
specify the model to fit
tuning
select a reasonable grid of model hyperparameter(s) values to choose from
for each combination of hyperparameters
- fit the model on training observations
- compute appropriate metrics on evaluation observations
pick-up the best hyperparamters combination
final evaluation and fit
compute the metric for the tuned model on the test set (observations never used yet)
obtain the final fit for the model on all the available observations
All the core packages in the tidymodels refer to one step of a supervised learning flow
All the core packages in the tidymodels refer to one step of a supervised learning flow
the rsample package provides tools for data splitting and resampling
the recipe package provides tools for data pre-processing and feature engineering
the parsnip package provides a unified interface to recall several models available in R
the workflows package combines together pre-processing, modeling and post-processing steps
the yardstick package provides several performance metrics
the dials package provides tools to define hyperparameters value grids
the tune package remarkably simplifies the hyperparameters optimization implementation
the broom package provides utility functions to tidify model output
The idea is to:
fit the model on a set of observations (training)
assess the model performance on a different set of observations (testing)
Refer to the Advertising data, split the observations in training/testing
training set
worklow and fit the model on the training set
rmse (root mean squared error)