pre-processing
split your observations in training, validation (or, cross-validate) and test
transform the predictors properly (features engineering)
model-spec
specify the model to fit
tuning
select a reasonable grid of model hyperparameter(s) values to choose from
for each combination of hyperparameters
- fit the model on training observations
- compute appropriate metrics on evaluation observations
pick-up the best hyperparamters combination
final evaluation and fit
compute the metric for the tuned model on the test set (observations never used yet)
obtain the final fit for the model on all the available observations
All the core packages in the tidymodels
refer to one step of a supervised learning flow
All the core packages in the tidymodels refer to one step of a supervised learning flow
the rsample
package provides tools for data splitting and resampling
the recipe
package provides tools for data pre-processing and feature engineering
the parsnip
package provides a unified interface to recall several models available in R
the workflows
package combines together pre-processing, modeling and post-processing steps
the yardstick
package provides several performance metrics
the dials
package provides tools to define hyperparameters value grids
the tune
package remarkably simplifies the hyperparameters optimization implementation
the broom
package provides utility functions to tidify model output
The idea is to:
fit the model on a set of observations (training)
assess the model performance on a different set of observations (testing)
Refer to the Advertising data
, split the observations in training/testing
training set
worklow
and fit the model on the training set
rmse
(root mean squared error)