class: center, middle, inverse, title-slide # Linear Regression (part 1) ## with tidymodels ### Statistical Learning ### Alfonso Iodice D'Enza --- class: animated fadeIn center middle inverse # prologue... --- class: animated fadeIn ### supervised learning flow .my-pull-left[ **pre-processing** ] .my-pull-right[ > - split your observations in **training**, **validation** (or, cross-validate) and **test** > > - transform the predictors properly (** features engineering **) ] -- .my-pull-left[ **model-spec** ] .my-pull-right[ > - **specify** the model to fit ] -- .my-pull-left[ **tuning** ] .my-pull-right[ > - select a reasonable grid of model **hyperparameter(s)** values to choose from > > - for each combination of hyperparameters > - **fit** the model on training observations > - compute appropriate **metrics** on evaluation observations > > - pick-up the **best** hyperparamters combination ] -- .my-pull-left[ **final evaluation and fit** ] .my-pull-right[ > - compute the metric for the **tuned model** on the **test** set (observations never used yet) > > - obtain the **final fit** for the model on all the available observations ] --- class: animated fadeIn ### the tidymodels metapackage ** All the core packages in the tidymodels refer to one step of a supervised learning flow ** <center> <img src="./figures/tidymodels.png" alt="tidymodels logo" height="300px" /> </center> ### For all things tidymodels check [tidymodels.org](https://www.tidymodels.org)! --- class: animated fadeIn ### the tidymodels core ** All the core packages in the tidymodels refer to one step of a supervised learning flow ** <center> <img src="./figures/tidymodels_all.jpeg" alt="all" height="500px" /> </center> --- class: animated fadeIn ### the tidymodels core the ** `\(\texttt{rsample}\)` ** package provides tools for data splitting and resampling <center> <img src="./figures/tidymodels_rsample.jpeg" alt="rsample" height="500px" /> </center> --- class: animated fadeIn ### the tidymodels core the ** `\(\texttt{recipes}\)` ** package provides tools for data pre-processing and feature engineering <center> <img src="./figures/tidymodels_recipes.jpeg" alt="recipes" height="500px" /> </center> --- class: animated fadeIn ### the tidymodels core the ** `\(\texttt{parsnip}\)` ** package provides a unified interface to recall several models available in R <center> <img src="./figures/tidymodels_parsnip.jpeg" alt="parsnip" height="500px" /> </center> --- class: animated fadeIn ### the tidymodels core the ** `\(\texttt{workflows}\)` ** package combines together pre-processing, modeling and post-processing steps <center> <img src="./figures/tidymodels_workflows.jpeg" alt="workflows" height="500px" /> </center> --- class: animated fadeIn ### the tidymodels core the ** `\(\texttt{yardstick}\)` ** package provides several performance metrics <center> <img src="./figures/tidymodels_yardstick.jpeg" alt="yardstick" height="500px" /> </center> --- class: animated fadeIn ### the tidymodels core the ** `\(\texttt{dials}\)` ** package provides tools to define hyperparameters value grids <center> <img src="./figures/tidymodels_dials.jpeg" alt="dials" height="500px" /> </center> --- class: animated fadeIn ### the tidymodels core the ** `\(\texttt{tune}\)` ** package remarkably simplifies the hyperparameters optimization implementation <center> <img src="./figures/tidymodels_tune.jpeg" alt="tune" height="500px" /> </center> --- class: animated fadeIn ### the tidymodels core the ** `\(\texttt{broom}\)` ** package provides utility functions to tidify model output <center> <img src="./figures/tidymodels_broom.jpeg" alt="tune" height="500px" /> </center> --- class: animated fadeIn center middle inverse # ...end of prologue --- class: animated fadeIn ### Advertising data ```r adv_data = read_csv(file="./data/Advertising.csv") %>% select(-1) adv_data %>% slice_sample(n = 8) %>% kbl() %>% kable_styling(font_size=10) ``` <table class="table" style="font-size: 10px; margin-left: auto; margin-right: auto;"> <thead> <tr> <th style="text-align:right;"> TV </th> <th style="text-align:right;"> radio </th> <th style="text-align:right;"> newspaper </th> <th style="text-align:right;"> sales </th> </tr> </thead> <tbody> <tr> <td style="text-align:right;"> 25.1 </td> <td style="text-align:right;"> 25.7 </td> <td style="text-align:right;"> 43.3 </td> <td style="text-align:right;"> 8.5 </td> </tr> <tr> <td style="text-align:right;"> 198.9 </td> <td style="text-align:right;"> 49.4 </td> <td style="text-align:right;"> 60.0 </td> <td style="text-align:right;"> 23.7 </td> </tr> <tr> <td style="text-align:right;"> 184.9 </td> <td style="text-align:right;"> 43.9 </td> <td style="text-align:right;"> 1.7 </td> <td style="text-align:right;"> 20.7 </td> </tr> <tr> <td style="text-align:right;"> 69.2 </td> <td style="text-align:right;"> 20.5 </td> <td style="text-align:right;"> 18.3 </td> <td style="text-align:right;"> 11.3 </td> </tr> <tr> <td style="text-align:right;"> 163.5 </td> <td style="text-align:right;"> 36.8 </td> <td style="text-align:right;"> 7.4 </td> <td style="text-align:right;"> 18.0 </td> </tr> <tr> <td style="text-align:right;"> 234.5 </td> <td style="text-align:right;"> 3.4 </td> <td style="text-align:right;"> 84.8 </td> <td style="text-align:right;"> 11.9 </td> </tr> <tr> <td style="text-align:right;"> 18.8 </td> <td style="text-align:right;"> 21.7 </td> <td style="text-align:right;"> 50.4 </td> <td style="text-align:right;"> 7.0 </td> </tr> <tr> <td style="text-align:right;"> 131.1 </td> <td style="text-align:right;"> 42.8 </td> <td style="text-align:right;"> 28.9 </td> <td style="text-align:right;"> 18.0 </td> </tr> </tbody> </table> - ** sales ** is the response, indicating the level of sales in a specific market - ** TV **, ** Radio ** and ** Newspaper ** are the predictors, indicating the advertising budget spent on the corresponding media --- class: animated fadeIn ### Advertising data ```r adv_data_tidy = adv_data %>% pivot_longer(names_to="medium",values_to="budget",cols = 1:3) adv_data_tidy %>% slice_sample(n = 8) %>% kbl() %>% kable_styling(font_size=10) ``` <table class="table" style="font-size: 10px; margin-left: auto; margin-right: auto;"> <thead> <tr> <th style="text-align:right;"> sales </th> <th style="text-align:left;"> medium </th> <th style="text-align:right;"> budget </th> </tr> </thead> <tbody> <tr> <td style="text-align:right;"> 15.2 </td> <td style="text-align:left;"> newspaper </td> <td style="text-align:right;"> 65.7 </td> </tr> <tr> <td style="text-align:right;"> 25.5 </td> <td style="text-align:left;"> radio </td> <td style="text-align:right;"> 42.0 </td> </tr> <tr> <td style="text-align:right;"> 18.9 </td> <td style="text-align:left;"> newspaper </td> <td style="text-align:right;"> 22.9 </td> </tr> <tr> <td style="text-align:right;"> 14.8 </td> <td style="text-align:left;"> TV </td> <td style="text-align:right;"> 280.2 </td> </tr> <tr> <td style="text-align:right;"> 10.8 </td> <td style="text-align:left;"> newspaper </td> <td style="text-align:right;"> 5.8 </td> </tr> <tr> <td style="text-align:right;"> 15.9 </td> <td style="text-align:left;"> radio </td> <td style="text-align:right;"> 16.7 </td> </tr> <tr> <td style="text-align:right;"> 18.9 </td> <td style="text-align:left;"> radio </td> <td style="text-align:right;"> 27.5 </td> </tr> <tr> <td style="text-align:right;"> 14.6 </td> <td style="text-align:left;"> TV </td> <td style="text-align:right;"> 78.2 </td> </tr> </tbody> </table> --- class: animated fadeIn ### Advertising data ```r adv_data_tidy %>% ggplot(aes(x = budget, y = sales)) + theme_minimal() + facet_wrap(~medium,scales = "free") + geom_point(color="indianred",alpha=.5,size=3) ``` <img src="Linear-Regression-part_1_files/figure-html/advert_plot-1.png" width="50%" style="display: block; margin: auto;" /> -- **Note**: the budget spent on TV is up to 300, less so for Radio and Newspaper --- class: animated fadeIn ### Advertising data: single regressions We can be **naive** and regress ** `\(\texttt{sales}\)` ** on ** `\(\texttt{tv}\)` **, ** `\(\texttt{newspaper}\)` ** and ** `\(\texttt{radio}\)` **, separately. --- class: animated fadeIn ### Advertising data: single regressions ```r *avd_lm_models_nest = adv_data_tidy %>% * group_by(medium) %>% * group_nest(.key = "datasets") %>% mutate( model_output = map(.x=datasets,~lm(sales~budget,data=.x)), model_params = map(.x=model_output, ~tidy(.x)), model_metrics = map(.x=model_output, ~glance(.x)), model_fitted = map(.x=model_output, ~augment(.x)) ) ``` ``` ## # A tibble: 3 × 2 ## medium datasets ## <chr> <list<tibble[,2]>> ## 1 newspaper [200 × 2] ## 2 radio [200 × 2] ## 3 TV [200 × 2] ``` --- class: animated fadeIn ### Advertising data: single regressions ```r avd_lm_models_nest = adv_data_tidy %>% group_by(medium) %>% group_nest(.key = "datasets") %>% * mutate( * model_output = map(.x=datasets,~lm(sales~budget,data=.x)), model_params = map(.x=model_output, ~tidy(.x)), model_metrics = map(.x=model_output, ~glance(.x)), model_fitted = map(.x=model_output, ~augment(.x)) * ) ``` ``` ## # A tibble: 3 × 3 ## medium datasets model_output ## <chr> <list<tibble[,2]>> <list> ## 1 newspaper [200 × 2] <lm> ## 2 radio [200 × 2] <lm> ## 3 TV [200 × 2] <lm> ``` --- class: animated fadeIn ### Advertising data: single regressions ```r avd_lm_models_nest = adv_data_tidy %>% group_by(medium) %>% group_nest(.key = "datasets") %>% * mutate( * model_output = map(.x=datasets,~lm(sales~budget,data=.x)), * model_params = map(.x=model_output, ~tidy(.x)), model_metrics = map(.x=model_output, ~glance(.x)), model_fitted = map(.x=model_output, ~augment(.x)) * ) ``` ``` ## # A tibble: 3 × 4 ## medium datasets model_output model_params ## <chr> <list<tibble[,2]>> <list> <list> ## 1 newspaper [200 × 2] <lm> <tibble [2 × 5]> ## 2 radio [200 × 2] <lm> <tibble [2 × 5]> ## 3 TV [200 × 2] <lm> <tibble [2 × 5]> ``` --- class: animated fadeIn ### Advertising data: single regressions ```r avd_lm_models_nest = adv_data_tidy %>% group_by(medium) %>% group_nest(.key = "datasets") %>% * mutate( * model_output = map(.x=datasets,~lm(sales~budget,data=.x)), * model_params = map(.x=model_output, ~tidy(.x)), * model_metrics = map(.x=model_output, ~glance(.x)), model_fitted = map(.x=model_output, ~augment(.x)) * ) ``` ``` ## # A tibble: 3 × 5 ## medium datasets model_output model_params model_metrics ## <chr> <list<tibble[,2]>> <list> <list> <list> ## 1 newspaper [200 × 2] <lm> <tibble [2 × 5]> <tibble [1 × 12]> ## 2 radio [200 × 2] <lm> <tibble [2 × 5]> <tibble [1 × 12]> ## 3 TV [200 × 2] <lm> <tibble [2 × 5]> <tibble [1 × 12]> ``` --- class: animated fadeIn ### Advertising data: single regressions ```r avd_lm_models_nest = adv_data_tidy %>% group_by(medium) %>% group_nest(.key = "datasets") %>% * mutate( * model_output = map(.x=datasets,~lm(sales~budget,data=.x)), * model_params = map(.x=model_output, ~tidy(.x)), * model_metrics = map(.x=model_output, ~glance(.x)), * model_fitted = map(.x=model_output, ~augment(.x)) * ) ``` ``` ## # A tibble: 3 × 6 ## medium datasets model_output model_params model_metrics model_fitted ## <chr> <list<tibble[,> <list> <list> <list> <list> ## 1 newspaper [200 × 2] <lm> <tibble> <tibble> <tibble> ## 2 radio [200 × 2] <lm> <tibble> <tibble> <tibble> ## 3 TV [200 × 2] <lm> <tibble> <tibble> <tibble> ``` -- This is a ** nested data structure ** with everything stored. The functions `\(\texttt{tidy}\)`, `\(\texttt{glance}\)` and `\(\texttt{augment}\)` pull all the information off of the model output, and arrange it in a tidy way. --- class: animated fadeIn ### Advertising data: nested structure The quantities nested in the tibble can be pulled out, or they can be *expanded* within the tibble itself, using `\(\texttt{unnest}\)` ```r avd_lm_models_nest %>% unnest(model_params) ``` ``` ## # A tibble: 6 × 10 ## medium datasets model_output term estimate std.error statistic p.value ## <chr> <list<tibb> <list> <chr> <dbl> <dbl> <dbl> <dbl> ## 1 newspaper [200 × 2] <lm> (Int… 12.4 0.621 19.9 4.71e-49 ## 2 newspaper [200 × 2] <lm> budg… 0.0547 0.0166 3.30 1.15e- 3 ## 3 radio [200 × 2] <lm> (Int… 9.31 0.563 16.5 3.56e-39 ## 4 radio [200 × 2] <lm> budg… 0.202 0.0204 9.92 4.35e-19 ## 5 TV [200 × 2] <lm> (Int… 7.03 0.458 15.4 1.41e-35 ## 6 TV [200 × 2] <lm> budg… 0.0475 0.00269 17.7 1.47e-42 ## # … with 2 more variables: model_metrics <list>, model_fitted <list> ``` --- class: animated fadeIn ### Advertising data: nested structure The quantities nested in the tibble can be pulled out, or they can be *expanded* within the tibble itself, using `\(\texttt{unnest}\)` ```r avd_lm_models_nest %>% unnest(model_params) %>% select(medium,term:p.value) %>% kbl(digits = 4) %>% kable_styling(font_size = 10) %>% row_spec(1:2,background = "#D9FDEC") %>% row_spec(3:4,background = "#CAF6FC") %>% row_spec(5:6,background = "#FDB5BA") ``` <table class="table" style="font-size: 10px; margin-left: auto; margin-right: auto;"> <thead> <tr> <th style="text-align:left;"> medium </th> <th style="text-align:left;"> term </th> <th style="text-align:right;"> estimate </th> <th style="text-align:right;"> std.error </th> <th style="text-align:right;"> statistic </th> <th style="text-align:right;"> p.value </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;background-color: #D9FDEC !important;"> newspaper </td> <td style="text-align:left;background-color: #D9FDEC !important;"> (Intercept) </td> <td style="text-align:right;background-color: #D9FDEC !important;"> 12.3514 </td> <td style="text-align:right;background-color: #D9FDEC !important;"> 0.6214 </td> <td style="text-align:right;background-color: #D9FDEC !important;"> 19.8761 </td> <td style="text-align:right;background-color: #D9FDEC !important;"> 0.0000 </td> </tr> <tr> <td style="text-align:left;background-color: #D9FDEC !important;"> newspaper </td> <td style="text-align:left;background-color: #D9FDEC !important;"> budget </td> <td style="text-align:right;background-color: #D9FDEC !important;"> 0.0547 </td> <td style="text-align:right;background-color: #D9FDEC !important;"> 0.0166 </td> <td style="text-align:right;background-color: #D9FDEC !important;"> 3.2996 </td> <td style="text-align:right;background-color: #D9FDEC !important;"> 0.0011 </td> </tr> <tr> <td style="text-align:left;background-color: #CAF6FC !important;"> radio </td> <td style="text-align:left;background-color: #CAF6FC !important;"> (Intercept) </td> <td style="text-align:right;background-color: #CAF6FC !important;"> 9.3116 </td> <td style="text-align:right;background-color: #CAF6FC !important;"> 0.5629 </td> <td style="text-align:right;background-color: #CAF6FC !important;"> 16.5422 </td> <td style="text-align:right;background-color: #CAF6FC !important;"> 0.0000 </td> </tr> <tr> <td style="text-align:left;background-color: #CAF6FC !important;"> radio </td> <td style="text-align:left;background-color: #CAF6FC !important;"> budget </td> <td style="text-align:right;background-color: #CAF6FC !important;"> 0.2025 </td> <td style="text-align:right;background-color: #CAF6FC !important;"> 0.0204 </td> <td style="text-align:right;background-color: #CAF6FC !important;"> 9.9208 </td> <td style="text-align:right;background-color: #CAF6FC !important;"> 0.0000 </td> </tr> <tr> <td style="text-align:left;background-color: #FDB5BA !important;"> TV </td> <td style="text-align:left;background-color: #FDB5BA !important;"> (Intercept) </td> <td style="text-align:right;background-color: #FDB5BA !important;"> 7.0326 </td> <td style="text-align:right;background-color: #FDB5BA !important;"> 0.4578 </td> <td style="text-align:right;background-color: #FDB5BA !important;"> 15.3603 </td> <td style="text-align:right;background-color: #FDB5BA !important;"> 0.0000 </td> </tr> <tr> <td style="text-align:left;background-color: #FDB5BA !important;"> TV </td> <td style="text-align:left;background-color: #FDB5BA !important;"> budget </td> <td style="text-align:right;background-color: #FDB5BA !important;"> 0.0475 </td> <td style="text-align:right;background-color: #FDB5BA !important;"> 0.0027 </td> <td style="text-align:right;background-color: #FDB5BA !important;"> 17.6676 </td> <td style="text-align:right;background-color: #FDB5BA !important;"> 0.0000 </td> </tr> </tbody> </table> -- The effect of budget on sales differs significantly from 0, for all the considered media --- class: animated fadeIn ### Ordinary least squares ** `$$\min_{\hat{\beta}_{0},\hat{\beta}_{1}}: RSS=\sum_{i=1}^{n}{e_{i}^{2}}=\sum_{i=1}^{n}{(y_{i}-\hat{y}_{i})^{2}}=\sum_{i=1}^{n}{(y_{i}-\hat{\beta}_{0}-\hat{\beta}_{1}x_{i})^{2}}$$` ** -- ### OLS estimator of `\(\beta_{0}\)` `$$\begin{split}&\partial_{\hat{\beta}_{0}}\sum_{i=1}^{n}{(y_{i}-\hat{\beta}_{0}-\hat{\beta}_{1}x_{i})^{2}}= -2\sum_{i=1}^{n}{(y_{i}-\hat{\beta}_{0}-\hat{\beta}_{1}x_{i})}=\\ &\sum_{i=1}^{n}{y_{i}}-n\hat{\beta}_{0}-\hat{\beta}_{1}\sum_{i=1}^{n}{x_{i}}=0\rightarrow \hat{\beta}_{0}=\bar{y}-\hat{\beta}_{1}\bar{x}\end{split}$$` --- class: animated fadeIn ### Ordinary least squares ** `$$\min_{\hat{\beta}_{0},\hat{\beta}_{1}}\sum_{i=1}^{n}{e_{i}^{2}}=\sum_{i=1}^{n}{(y_{i}-\hat{y}_{i})^{2}}=\sum_{i=1}^{n}{(y_{i}-\hat{\beta}_{0}-\hat{\beta}_{1}x_{i})^{2}}$$` ** ### OLS estimator of `\(\beta_{1}\)` `$$\begin{split}&\partial_{\hat{\beta}_{1}}\sum_{i=1}^{n}{(y_{i}-\hat{\beta}_{0}-\hat{\beta}_{1}x_{i})^{2}}= -2\sum_{i=1}^{n}{x_{i}(y_{i}-\hat{\beta}_{0}-\hat{\beta}_{1}x_{i})}= \sum_{i=1}^{n}{x_{i}y_{i}}-\hat{\beta}_{0}\sum_{i=1}^{n}{x_{i}}-\hat{\beta}_{1}\sum_{i=1}^{n}{x_{i}^{2}}=0\\ &\hat{\beta}_{1}\sum_{i=1}^{n}{x_{i}^{2}}=\sum_{i=1}^{n}{x_{i}y_{i}}-\sum_{i=1}^{n}{x_{i}}\left(\frac{\sum_{i=1}^{n}{y_{i}}}{n}-\hat{\beta}_{1}\frac{\sum_{i=1}^{n}{x_{i}}}{n}\right)\\ &\hat{\beta}_{1}\left(n\sum_{i=1}^{n}{x_{i}^{2}}-(\sum_{i=1}^{n}{x_{i}})^{2} \right)= n\sum_{i=1}^{n}{x_{i}y_{i}}-\sum_{i=1}^{n}{x_{i}}\sum_{i=1}^{n}{y_{i}}\\ &\hat{\beta}_{1}=\frac{n\sum_{i=1}^{n}{x_{i}y_{i}}-\sum_{i=1}^{n}{x_{i}}\sum_{i=1}^{n}{y_{i}}} {n\sum_{i=1}^{n}{x_{i}^{2}}-(\sum_{i=1}^{n}{x_{i}})^{2} }=\frac{\sigma_{xy}}{\sigma^{2}_{x}} \end{split}$$` --- class: animated fadeIn ### Three regression lines ```r *three_regs_plot=avd_lm_models_nest %>% unnest(model_fitted) %>% * select(medium,sales:.fitted) %>% * ggplot(aes(x=budget,y=sales))+theme_minimal()+facet_grid(~medium,scales="free")+ geom_point(alpha=.5,color = "indianred")+ geom_smooth(method="lm")+ geom_segment(aes(x=budget,xend=budget,y=.fitted,yend=sales),color="forestgreen",alpha=.25) ``` <img src="Linear-Regression-part_1_files/figure-html/unnamed-chunk-1-1.png" width="40%" style="display: block; margin: auto;" /> --- class: animated fadeIn ### Three regression lines ```r *three_regs_plot = avd_lm_models_nest %>% unnest(model_fitted) %>% * select(medium,sales:.fitted) %>% * ggplot(aes(x=budget,y=sales))+theme_minimal()+facet_grid(~medium,scales="free")+ * geom_point(alpha=.5,color = "indianred")+ geom_smooth(method="lm")+ geom_segment(aes(x=budget,xend=budget,y=.fitted,yend=sales),color="forestgreen",alpha=.25) ``` <img src="Linear-Regression-part_1_files/figure-html/unnamed-chunk-2-1.png" width="40%" style="display: block; margin: auto;" /> --- class: animated fadeIn ### Three regression lines ```r *three_regs_plot=avd_lm_models_nest %>% unnest(model_fitted) %>% * select(medium,sales:.fitted) %>% * ggplot(aes(x=budget,y=sales))+theme_minimal()+facet_grid(~medium,scales="free")+ * geom_point(alpha=.5,color = "indianred")+ * geom_smooth(method="lm")+ geom_segment(aes(x=budget,xend=budget,y=.fitted,yend=sales),color="forestgreen",alpha=.25) ``` <img src="Linear-Regression-part_1_files/figure-html/unnamed-chunk-3-1.png" width="40%" style="display: block; margin: auto;" /> --- class: animated fadeIn ### Three regression lines ```r *three_regs_plot=avd_lm_models_nest %>% unnest(model_fitted) %>% * select(medium,sales:.fitted) %>% * ggplot(aes(x=budget,y=sales))+theme_minimal()+facet_grid(~medium,scales="free")+ * geom_point(alpha=.5,color = "indianred")+ * geom_smooth(method="lm")+ * geom_segment(aes(x=budget,xend=budget,y=.fitted,yend=sales),color="forestgreen",alpha=.25) ``` <img src="Linear-Regression-part_1_files/figure-html/unnamed-chunk-4-1.png" width="40%" style="display: block; margin: auto;" /> --- class: animated fadeIn ### Advertising data: nested structure The quantities nested in the tibble can be pulled out, or they can be *expanded* within the tibble itself, using `\(\texttt{unnest}\)` ```r avd_lm_models_nest %>% unnest(model_metrics) %>% select(medium,r.squared:df.residual) %>% kbl(digits = 4) %>% kable_styling(font_size = 10) %>% row_spec(1,background = "#D9FDEC") %>% row_spec(2,background = "#CAF6FC") %>% row_spec(3,background = "#FDB5BA") ``` <table class="table" style="font-size: 10px; margin-left: auto; margin-right: auto;"> <thead> <tr> <th style="text-align:left;"> medium </th> <th style="text-align:right;"> r.squared </th> <th style="text-align:right;"> adj.r.squared </th> <th style="text-align:right;"> sigma </th> <th style="text-align:right;"> statistic </th> <th style="text-align:right;"> p.value </th> <th style="text-align:right;"> df </th> <th style="text-align:right;"> logLik </th> <th style="text-align:right;"> AIC </th> <th style="text-align:right;"> BIC </th> <th style="text-align:right;"> deviance </th> <th style="text-align:right;"> df.residual </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;background-color: #D9FDEC !important;"> newspaper </td> <td style="text-align:right;background-color: #D9FDEC !important;"> 0.0521 </td> <td style="text-align:right;background-color: #D9FDEC !important;"> 0.0473 </td> <td style="text-align:right;background-color: #D9FDEC !important;"> 5.0925 </td> <td style="text-align:right;background-color: #D9FDEC !important;"> 10.8873 </td> <td style="text-align:right;background-color: #D9FDEC !important;"> 0.0011 </td> <td style="text-align:right;background-color: #D9FDEC !important;"> 1 </td> <td style="text-align:right;background-color: #D9FDEC !important;"> -608.3357 </td> <td style="text-align:right;background-color: #D9FDEC !important;"> 1222.671 </td> <td style="text-align:right;background-color: #D9FDEC !important;"> 1232.566 </td> <td style="text-align:right;background-color: #D9FDEC !important;"> 5134.805 </td> <td style="text-align:right;background-color: #D9FDEC !important;"> 198 </td> </tr> <tr> <td style="text-align:left;background-color: #CAF6FC !important;"> radio </td> <td style="text-align:right;background-color: #CAF6FC !important;"> 0.3320 </td> <td style="text-align:right;background-color: #CAF6FC !important;"> 0.3287 </td> <td style="text-align:right;background-color: #CAF6FC !important;"> 4.2749 </td> <td style="text-align:right;background-color: #CAF6FC !important;"> 98.4216 </td> <td style="text-align:right;background-color: #CAF6FC !important;"> 0.0000 </td> <td style="text-align:right;background-color: #CAF6FC !important;"> 1 </td> <td style="text-align:right;background-color: #CAF6FC !important;"> -573.3369 </td> <td style="text-align:right;background-color: #CAF6FC !important;"> 1152.674 </td> <td style="text-align:right;background-color: #CAF6FC !important;"> 1162.569 </td> <td style="text-align:right;background-color: #CAF6FC !important;"> 3618.479 </td> <td style="text-align:right;background-color: #CAF6FC !important;"> 198 </td> </tr> <tr> <td style="text-align:left;background-color: #FDB5BA !important;"> TV </td> <td style="text-align:right;background-color: #FDB5BA !important;"> 0.6119 </td> <td style="text-align:right;background-color: #FDB5BA !important;"> 0.6099 </td> <td style="text-align:right;background-color: #FDB5BA !important;"> 3.2587 </td> <td style="text-align:right;background-color: #FDB5BA !important;"> 312.1450 </td> <td style="text-align:right;background-color: #FDB5BA !important;"> 0.0000 </td> <td style="text-align:right;background-color: #FDB5BA !important;"> 1 </td> <td style="text-align:right;background-color: #FDB5BA !important;"> -519.0457 </td> <td style="text-align:right;background-color: #FDB5BA !important;"> 1044.091 </td> <td style="text-align:right;background-color: #FDB5BA !important;"> 1053.986 </td> <td style="text-align:right;background-color: #FDB5BA !important;"> 2102.531 </td> <td style="text-align:right;background-color: #FDB5BA !important;"> 198 </td> </tr> </tbody> </table> --- class: animated fadeIn ### Linear regression: model assumptions The linear regression model is ** `$$y_{i}=\beta_{0}+\beta_{1}x_{1}+\epsilon_{i}$$` ** where ** `\(\epsilon_{i}\)` ** is a random variable with expected value of **0**. For inference, more assumptions must me made: - ** `\(\epsilon_{i}\sim N(0,\sigma^{2})\)` **, then the variance of the errors ** `\(\sigma^{2}\)` ** does not depend on ** `\(x_{i}\)` **. - ** `\(Cov(\epsilon_{i},\epsilon_{i'})=0\)` **, for all `\(i\neq i'\)` and `\(i,i'=1,\ldots,n\)`. - ** `\(x_{i}\)` ** non stochastic -- It follows that ** `\(y_{i}\)` ** is a random variable such that - ** `\(E[y_{i}]=\beta_{0}+\beta_{1}x_{1}\)` ** - ** `\(Var[y_{i}]=\sigma^{2}\)` ** --- class: animated fadeIn ### Linear regression: model assumptions <center> <img src="./figures/RegAssum.png" alt="RegAssum" height="500px" /> </center> <center> Statistics for Business and Economics (Anderson, Sweeney and Williams, (2011)) </center> --- class: animated fadeIn ### `\(\sigma^{2}\)` estimator The variance ** `\(\sigma^{2}\)` ** is assumed to be constant, but it is unknown, and it has to be estimated: - since ** `\(y_{i}\sim N(\beta_{0}+\beta_{1}x_{i},\sigma^{2})\)` **, it follows that ** `$$\frac{y_{i}-\beta_{0}-\beta_{1}x_{i}}{\sigma}\sim N(0,1)$$` ** -- - furthermore, recall that ** `\(\sum_{i=1}^{n}\left[N(0,1)\right]^{2} = \chi^{2}_{n}\)` **, then ** `$$\sum_{i=1}^{n}\left[N(0,1)\right]^{2}=\sum_{i=1}^{n}\left[\frac{y_{i}-\beta_{0}-\beta_{1}x_{i}}{\sigma}\right]^{2}=\chi^{2}_{n}$$` ** -- - replacing ** `\(\beta_{0}\)` ** and ** `\(\beta_{1}\)` ** with their estimators ** `\(\hat{\beta}_{0}\)` ** and ** `\(\hat{\beta}_{1}\)` **, the previous becomes ** `$$\frac{\sum_{i=1}^{n}\left(y_{i}-\hat{\beta}_{0}-\hat{\beta}_{1}x_{i}\right)^{2}}{\sigma^{2}}=\frac{RSS}{\sigma^{2}}=\chi^{2}_{n-2}$$` ** two degree of freedom are lost as the two parameters are replaced by their estimators. --- class: animated fadeIn ### `\(\sigma^{2}\)` estimator - Finally, since ** `\(E\left[\chi^{2}_{n-2}\right]=n-2\)` ** then ** `$$E\left[\chi^{2}_{n-2}\right]=E\left[\frac{RSS}{\sigma^{2}}\right]=n-2$$` ** - and because ** `\(\sigma^{2}\)` ** is constant, it can be pulled out from the expectation, and re-write ** `$$\frac{E\left[RSS\right]}{\sigma^{2}}=n-2$$` ** - it follows that ** `$$\sigma^{2}=E\left[\frac{RSS}{n-2}\right]$$` ** and ** `\(\frac{RSS}{n-2}\)` ** is an unbiased estimator of `\(\sigma^{2}\)`. - ** `\(\sqrt{\frac{RSS}{n-2}}=RSE\)` **, the so-called **residual standard error** --- class: animated fadeIn ### `\(\hat{\beta}_{1}\)` as a linear combination of `\(y_{i}\)` `$$\begin{split} \hat{\beta}_{1}&= \frac{\sum_{i=1}^{n}{\left(y_{i}-\bar{y}\right)\left(x_{i}-\bar{x}\right)} }{\sum_{i=1}^{n}{\left(x_{i}-\bar{x}\right)^{2}}} =\frac{\sum_{i=1}^{n}{\left[y_{i} \left(x_{i}-\bar{x}\right) -\bar{y}\left(x_{i}-\bar{x}\right)\right]} }{\sum_{i=1}^{n}{\left(x_{i}-\bar{x}\right)^{2}}}=\\ &=\frac{\sum_{i=1}^{n}{y_{i} \left(x_{i}-\bar{x}\right)} -\bar{y}\sum_{i=1}^{n}{\left(x_{i}-\bar{x}\right)}}{\sum_{i=1}^{n}{\left(x_{i}-\bar{x}\right)^{2}}} = \sum_{i=1}^{n}{y_{i} \frac{\left(x_{i}-\bar{x}\right)}{\sum_{i=1}^{n}{\left(x_{i}-\bar{x}\right)^{2}}}}= \sum_{i=1}^{n}{w_{i}y_{i}}\end{split}$$` Given a linear combination `\(U_{i}=c+d V_{i}\)`, if `\(V_{i}\sim N(\mu_{v},\sigma^{2}_{v})\)`, then `\(U_{i}\sim N(c+d\mu_{v},d^{2}\sigma^{2}_{v})\)`. Since `\(Y_{i}\sim N(\beta_{0}+\beta_{1}x_{i},\sigma^{2})\)`, and `\(\hat{\beta}_{1}=\sum_{i=1}^{n}{w_{i}y_{i}}\)` then `\(c=0\)` and `\(d=w_{i}\)`, then ** `$$\hat{\beta}_{1}\sim N(\sum_{i=1}^{n}{w_{i}(\beta_{0}+\beta_{1}x_{i})},\sum_{i=1}^{n}{w_{i}^{2}\sigma^{2}})$$` ** --- class: animated fadeIn ### Parameters of the `\(\hat{\beta}_{1}\)` distribution: `\(E\left[\hat{\beta}_{1}\right]=\beta_{1}\)` The expectation of ** `\(\hat{\beta}_{1}\)` ** is - ** `\(E\left[\hat{\beta}_{1}\right] = \sum_{i=1}^{n}{w_{i}(\beta_{0}+\beta_{1}x_{i})} = \beta_{0}\underbrace{\sum_{i=1}^{n}{w_{i}}}_{0}+\beta_{1}\sum_{i=1}^{n}{w_{i}x_{i}}=\beta_{1}\sum_{i=1}^{n}{\frac{x_{i}-\bar{x}}{\sum_{i=1}^{n}(x_{i}-\bar{x})^{2}}x_{i}}=\)` ** -- ** `\(=\beta_{1}\sum_{i=1}^{n}{\frac{x_{i}-\bar{x}}{\sum_{i=1}^{n}(x_{i}-\bar{x})^{2}}x_{i}}=\)` ** --- class: animated fadeIn ### Parameters of the `\(\hat{\beta}_{1}\)` distribution: `\(E\left[\hat{\beta}_{1}\right]=\beta_{1}\)` The expectation of ** `\(\hat{\beta}_{1}\)` ** is - ** `\(E\left[\hat{\beta}_{1}\right] = \sum_{i=1}^{n}{w_{i}(\beta_{0}+\beta_{1}x_{i})} = \beta_{0}\underbrace{\sum_{i=1}^{n}{w_{i}}}_{0}+\beta_{1}\sum_{i=1}^{n}{w_{i}x_{i}}=\beta_{1}\sum_{i=1}^{n}{\frac{x_{i}-\bar{x}}{\sum_{i=1}^{n}(x_{i}-\bar{x})^{2}}x_{i}}=\)` ** ** `\(=\beta_{1}\sum_{i=1}^{n}{\frac{x_{i}-\bar{x}}{\sum_{i=1}^{n}(x_{i}-\bar{x})^{2}}x_{i}}=\beta_{1}\frac{\sum_{i=1}^{n}{(x_{i}-\bar{x})x_{i}}}{\sum_{i=1}^{n}(x_{i}-\bar{x})^{2}}=\)` ** --- class: animated fadeIn ### Parameters of the `\(\hat{\beta}_{1}\)` distribution: `\(E\left[\hat{\beta}_{1}\right]=\beta_{1}\)` The expectation of ** `\(\hat{\beta}_{1}\)` ** is - ** `\(E\left[\hat{\beta}_{1}\right] = \sum_{i=1}^{n}{w_{i}(\beta_{0}+\beta_{1}x_{i})} = \beta_{0}\underbrace{\sum_{i=1}^{n}{w_{i}}}_{0}+\beta_{1}\sum_{i=1}^{n}{w_{i}x_{i}}=\beta_{1}\sum_{i=1}^{n}{\frac{x_{i}-\bar{x}}{\sum_{i=1}^{n}(x_{i}-\bar{x})^{2}}x_{i}}=\)` ** ** `\(=\beta_{1}\sum_{i=1}^{n}{\frac{x_{i}-\bar{x}}{\sum_{i=1}^{n}(x_{i}-\bar{x})^{2}}x_{i}}=\beta_{1}\frac{\sum_{i=1}^{n}{(x_{i}-\bar{x})x_{i}}}{\sum_{i=1}^{n}(x_{i}-\bar{x})^{2}}=\beta_{1}\frac{\sum_{i=1}^{n}{x_{i}^2}-\bar{x}\sum_{i=1}^{n}{x_{i}}}{\sum_{i=1}^{n}(x_{i}-\bar{x})^{2}}\)` ** --- class: animated fadeIn ### Parameters of the `\(\hat{\beta}_{1}\)` distribution: `\(E\left[\hat{\beta}_{1}\right]=\beta_{1}\)` The expectation of ** `\(\hat{\beta}_{1}\)` ** is - ** `\(E\left[\hat{\beta}_{1}\right] = \sum_{i=1}^{n}{w_{i}(\beta_{0}+\beta_{1}x_{i})} = \beta_{0}\underbrace{\sum_{i=1}^{n}{w_{i}}}_{0}+\beta_{1}\sum_{i=1}^{n}{w_{i}x_{i}}=\beta_{1}\sum_{i=1}^{n}{\frac{x_{i}-\bar{x}}{\sum_{i=1}^{n}(x_{i}-\bar{x})^{2}}x_{i}}=\)` ** ** `\(=\beta_{1}\sum_{i=1}^{n}{\frac{x_{i}-\bar{x}}{\sum_{i=1}^{n}(x_{i}-\bar{x})^{2}}x_{i}}=\beta_{1}\frac{\sum_{i=1}^{n}{(x_{i}-\bar{x})x_{i}}}{\sum_{i=1}^{n}(x_{i}-\bar{x})^{2}}=\beta_{1}\frac{\sum_{i=1}^{n}{x_{i}^2}-\bar{x}\sum_{i=1}^{n}{x_{i}}}{\sum_{i=1}^{n}(x_{i}-\bar{x})^{2}}=\beta_{1}\frac{1}{\sum_{i=1}^{n}(x_{i}-\bar{x})^{2}}\sum_{i=1}^{n}{x_{i}^2}-\frac{\sum_{i=1}^{n}{x_{i}}}{n}\sum_{i=1}^{n}{x_{i}}=\)` ** --- class: animated fadeIn ### Parameters of the `\(\hat{\beta}_{1}\)` distribution: `\(E\left[\hat{\beta}_{1}\right]=\beta_{1}\)` The expectation of ** `\(\hat{\beta}_{1}\)` ** is - ** `\(E\left[\hat{\beta}_{1}\right] = \sum_{i=1}^{n}{w_{i}(\beta_{0}+\beta_{1}x_{i})} = \beta_{0}\underbrace{\sum_{i=1}^{n}{w_{i}}}_{0}+\beta_{1}\sum_{i=1}^{n}{w_{i}x_{i}}=\beta_{1}\sum_{i=1}^{n}{\frac{x_{i}-\bar{x}}{\sum_{i=1}^{n}(x_{i}-\bar{x})^{2}}x_{i}}=\)` ** ** `\(=\beta_{1}\sum_{i=1}^{n}{\frac{x_{i}-\bar{x}}{\sum_{i=1}^{n}(x_{i}-\bar{x})^{2}}x_{i}}=\beta_{1}\frac{\sum_{i=1}^{n}{(x_{i}-\bar{x})x_{i}}}{\sum_{i=1}^{n}(x_{i}-\bar{x})^{2}}=\beta_{1}\frac{\sum_{i=1}^{n}{x_{i}^2}-\bar{x}\sum_{i=1}^{n}{x_{i}}}{\sum_{i=1}^{n}(x_{i}-\bar{x})^{2}}=\beta_{1}\frac{1}{\sum_{i=1}^{n}(x_{i}-\bar{x})^{2}}\sum_{i=1}^{n}{x_{i}^2}-\frac{\sum_{i=1}^{n}{x_{i}}}{n}\sum_{i=1}^{n}{x_{i}}=\)` ** ** `\(=\beta_{1}\frac{1}{\sum_{i=1}^{n}(x_{i}-\bar{x})^{2}}\frac{n\sum_{i=1}^{n}{x_{i}^2}-\left[\sum_{i=1}^{n}{x_{i}}\right]^2}{n}=\)` ** Note: `\(\frac{n\sum_{i=1}^{n}{x_{i}^2}-\left[\sum_{i=1}^{n}{x_{i}}\right]^2}{n} \frac{1}{n} =\frac{n\sum_{i=1}^{n}{x_{i}^2}}{n^{2}}-\frac{\left[\sum_{i=1}^{n}{x_{i}}\right]^2}{n^{2}}=\frac{\sum_{i=1}^{n}{x_{i}^2}}{n}-\bar{x}^{2}=var(x)\)` -- therefore `\(\frac{n\sum_{i=1}^{n}{x_{i}^2}-\left[\sum_{i=1}^{n}{x_{i}}\right]^2}{n} = var(x)n=\sum_{i=1}^{n}(x_{i}-\bar{x})^{2}\)` -- ** `\(=\beta_{1}\frac{1}{\sum_{i=1}^{n}(x_{i}-\bar{x})^{2}}\frac{n\sum_{i=1}^{n}{x_{i}^2}-\left[\sum_{i=1}^{n}{x_{i}}\right]^2}{n}=\beta_{1}\frac{1}{\sum_{i=1}^{n}(x_{i}-\bar{x})^{2}}=\sum_{i=1}^{n}(x_{i}-\bar{x})^{2}=\beta_{1}\)` ** --- class: animated fadeIn ### Parameters of the `\(\hat{\beta}_{1}\)` distribution: `\(var\left(\hat{\beta}_{1}\right) = \frac{\sigma^{2}}{\sum_{i=1}^{n}(x_{i}-\bar{x})^{2}}\)` The variance of ** `\(\hat{\beta}_{1}\)` ** is ** `\(var\left(\hat{\beta}_{1}\right) = \sum_{i=1}^{n}{w^{2}_{i}\sigma^{2}}=\)` ** --- class: animated fadeIn ### Parameters of the `\(\hat{\beta}_{1}\)` distribution: `\(var\left(\hat{\beta}_{1}\right) = \frac{\sigma^{2}}{\sum_{i=1}^{n}(x_{i}-\bar{x})^{2}}\)` The variance of ** `\(\hat{\beta}_{1}\)` ** is ** `\(var\left(\hat{\beta}_{1}\right) = \sum_{i=1}^{n}{w^{2}_{i}\sigma^{2}} =\sum_{i=1}^{n}{\left[\frac{x_{i}-\bar{x}}{\sum_{i=1}^{n}(x_{i}-\bar{x})^{2}}\right]^{2}}\sigma^{2}\)` ** --- class: animated fadeIn ### Parameters of the `\(\hat{\beta}_{1}\)` distribution: `\(var\left(\hat{\beta}_{1}\right) = \frac{\sigma^{2}}{\sum_{i=1}^{n}(x_{i}-\bar{x})^{2}}\)` The variance of ** `\(\hat{\beta}_{1}\)` ** is ** `\(var\left(\hat{\beta}_{1}\right) = \sum_{i=1}^{n}{w^{2}_{i}\sigma^{2}} =\sum_{i=1}^{n}{\left[\frac{x_{i}-\bar{x}}{\sum_{i=1}^{n}(x_{i}-\bar{x})^{2}}\right]^{2}}\sigma^{2}=\sum_{i=1}^{n}{\frac{(x_{i}-\bar{x})^{2}}{\left[\sum_{i=1}^{n}(x_{i}-\bar{x})^{2}\right]^{2}}}\sigma^{2}\)` ** --- class: animated fadeIn ### Parameters of the `\(\hat{\beta}_{1}\)` distribution: `\(var\left(\hat{\beta}_{1}\right) = \frac{\sigma^{2}}{\sum_{i=1}^{n}(x_{i}-\bar{x})^{2}}\)` The variance of ** `\(\hat{\beta}_{1}\)` ** is ** `\(var\left(\hat{\beta}_{1}\right) = \sum_{i=1}^{n}{w^{2}_{i}\sigma^{2}} =\sum_{i=1}^{n}{\left[\frac{x_{i}-\bar{x}}{\sum_{i=1}^{n}(x_{i}-\bar{x})^{2}}\right]^{2}}\sigma^{2}=\sum_{i=1}^{n}{\frac{(x_{i}-\bar{x})^{2}}{\left[\sum_{i=1}^{n}(x_{i}-\bar{x})^{2}\right]^{2}}}\sigma^{2}={\frac{\sum_{i=1}^{n}(x_{i}-\bar{x})^{2}}{\left[\sum_{i=1}^{n}(x_{i}-\bar{x})^{2}\right]^{2}}}\sigma^{2}\)` ** --- class: animated fadeIn ### Parameters of the `\(\hat{\beta}_{1}\)` distribution: `\(var\left(\hat{\beta}_{1}\right) = \frac{\sigma^{2}}{\sum_{i=1}^{n}(x_{i}-\bar{x})^{2}}\)` The variance of ** `\(\hat{\beta}_{1}\)` ** is ** `\(var\left(\hat{\beta}_{1}\right) = \sum_{i=1}^{n}{w^{2}_{i}\sigma^{2}} =\sum_{i=1}^{n}{\left[\frac{x_{i}-\bar{x}}{\sum_{i=1}^{n}(x_{i}-\bar{x})^{2}}\right]^{2}}\sigma^{2}=\sum_{i=1}^{n}{\frac{(x_{i}-\bar{x})^{2}}{\left[\sum_{i=1}^{n}(x_{i}-\bar{x})^{2}\right]^{2}}}\sigma^{2}={\frac{\sum_{i=1}^{n}(x_{i}-\bar{x})^{2}}{\left[\sum_{i=1}^{n}(x_{i}-\bar{x})^{2}\right]^{2}}}\sigma^{2}={\frac{1}{\sum_{i=1}^{n}(x_{i}-\bar{x})^{2}}}\sigma^{2}\)` ** --- class: animated fadeIn ### Parameters of the `\(\hat{\beta}_{1}\)` distribution: `\(var\left(\hat{\beta}_{1}\right) = \frac{\sigma^{2}}{\sum_{i=1}^{n}(x_{i}-\bar{x})^{2}}\)` The variance of ** `\(\hat{\beta}_{1}\)` ** is ** `\(var\left(\hat{\beta}_{1}\right) = \sum_{i=1}^{n}{w^{2}_{i}\sigma^{2}} =\sum_{i=1}^{n}{\left[\frac{x_{i}-\bar{x}}{\sum_{i=1}^{n}(x_{i}-\bar{x})^{2}}\right]^{2}}\sigma^{2}=\sum_{i=1}^{n}{\frac{(x_{i}-\bar{x})^{2}}{\left[\sum_{i=1}^{n}(x_{i}-\bar{x})^{2}\right]^{2}}}\sigma^{2}={\frac{\sum_{i=1}^{n}(x_{i}-\bar{x})^{2}}{\left[\sum_{i=1}^{n}(x_{i}-\bar{x})^{2}\right]^{2}}}\sigma^{2}={\frac{1}{\sum_{i=1}^{n}(x_{i}-\bar{x})^{2}}}\sigma^{2}\)` ** -- And the standard error of ** `\(\hat{\beta}_{1}\)` ** is ** `\(SE\left(\hat{\beta}_{1}\right) = \sqrt{\frac{\sigma^{2}}{\sum_{i=1}^{n}(x_{i}-\bar{x})^{2}}}\)` ** --- class: animated fadeIn ### Inference on `\(\hat{\beta}_{1}\)`: hypothesis testing The null hypothesis is ** `\(\beta_{1}=0\)` ** and the test statistic is ** `$$\frac{\hat{\beta}_{1}-\beta_{1}}{SE(\hat{\beta}_{1})}$$` ** that, under the null hypothesis becomes just ** `\(\frac{\hat{\beta}_{1}}{SE(\hat{\beta}_{1})}\)` ** distributed as a *Student t* with n-2 d.f. ** `$$\text{Reject } H_{0} \text{ if } \left|\frac{\hat{\beta}_{1}}{SE(\hat{\beta}_{1})}\right|>t_{1-\frac{\alpha}{2},n-2}$$` ** --- class: animated fadeIn ### Linear regression: multiple predictors - ** `\(y\)` ** is the numeric response; ** `\(X_{1},\ldots,X_{p}\)` ** are the predictors, the general formula for the linear model is ** `$$y = f(X)+\epsilon=\beta_{0}+\sum_{j=1}^{p}X_{j}\beta_{j}+ \epsilon$$` ** -- In algebraic form ** `$${\bf y}={\bf X}{\bf \beta}+{\bf \epsilon}$$` ** `$$\begin{bmatrix} y_{1}\\ y_{2}\\ y_{3}\\ \vdots\\ y_{n} \end{bmatrix}=\begin{bmatrix} 1& x_{1,1}&\ldots&x_{1,p}\\ 1& x_{2,1}&\ldots&x_{2,p}\\ 1& x_{3,1}&\ldots&x_{3,p}\\ \vdots&\vdots&\vdots&\\ 1& x_{n,1}&\ldots&x_{n,p}\\ \end{bmatrix}\begin{bmatrix} \beta_{0}\\ \beta_{1}\\ \vdots\\ \beta_{p}\\ \end{bmatrix}+\begin{bmatrix} \epsilon_{1}\\ \epsilon_{2}\\ \epsilon_{3}\\ \vdots\\ \epsilon_{n} \end{bmatrix}$$` --- class: animated fadeIn ### Linear regression: ordinary least square (OLS) ** OLS target ** ** `$$\min_{\hat{\beta}_{0},\hat{\beta}_{1},\ldots,\hat{\beta}_{p}}\left(\sum_{i=1}^{n}y_{i}-\hat{\beta}_{0}-\sum_{j=1}^{p}X_{j}\beta_{j}\right)^{2}=\min_{\hat{\beta}_{0},\hat{\beta}_{1},\ldots,\hat{\beta}_{p}}RSS$$` ** -- ** OLS target (algebraic) ** ### $$ \min_{\hat{\beta}} ({\bf y}-{\bf X}{\bf \hat{\beta}})^{\sf T}({\bf y}-{\bf X}{\bf \hat{\beta}})$$ --- class: animated fadeIn ### Linear regression: ordinary least square (OLS) solution ** OLS target ** ** `$$\min_{\hat{\beta}}({\bf y}-{\bf X}{\bf \hat{\beta}})^{\sf T}({\bf y}-{\bf X}{\bf \hat{\beta}})=\min_{\hat{\beta}}\left({\bf y}^{\sf T}{\bf y}-{\bf y}^{\sf T}{\bf X} {\bf \hat{\beta}}-{\bf{\hat{\beta}}}^{\sf T}{\bf X}^{\sf T}{\bf y}+{\bf {\hat{\beta}}}^{\sf T}{\bf X}^{\sf T}{\bf X} {\bf \hat{\beta}}\right)$$` ** -- ** First order conditions ** ** `$$\partial_{\hat{\beta}}\left({\bf y}^{\sf T}{\bf y}-{\bf y}^{\sf T}{\bf X} {\bf \hat{\beta}}-{\bf{\hat{\beta}}}^{\sf T}{\bf X}^{\sf T}{\bf y}+{\bf {\hat{\beta}}}^{\sf T}{\bf X}^{\sf T}{\bf X} {\bf \hat{\beta}}\right)=0$$` ** -- Since ** `\(\partial_{\bf\hat{\beta}}\left({\bf y}^{\sf T}{\bf X} {\bf \hat{\beta}}\right)=\partial_{\bf\hat{\beta}}\left({ \bf {\hat{\beta}}}^{\sf T}{\bf X}^{\sf T}{\bf y}\right)={\bf X}^{\sf T}{\bf y}\)` ** it results that ** `$$\partial_{\hat{\beta}}RSS=-2{\bf X}^{\sf T}{\bf y}+2{\bf X}^{\sf T}{\bf X} {\bf \hat{\beta}}=0$$` ** And the ** OLS solution ** is ### $$ \hat{\beta} = ({\bf X}^{\sf T}{\bf X})^{-1}{\bf X}^{\sf T}{\bf y} $$ --- class: animated fadeIn ### Distributional assumptions ** `\(\epsilon \sim N(0,\sigma^{2}{\bf I})\)` **, assuming non-stochastic predictors, it follows that ** `$$E[{\bf y}]= E[{\bf X}\beta+\epsilon]={\bf X}\beta \ \ \text{ and } \ \ var[{\bf y}]= \sigma^{2}{\bf I}$$` ** -- ### Distribution of linear combinations ** $$ \text{if } \quad {\bf U}\sim N({\bf \mu},{\bf\Sigma}) \quad \text{ and } \quad {\bf V} = {\bf c}+{\bf D}{\bf U} \quad \text{then} \quad {\bf V}\sim N({\bf c}+{\bf D}\mu,{\bf D}{\bf \Sigma}{\bf D}^{\sf T})$$ ** -- ### Distribution of `\(\hat{\beta}\)` ** `\(\hat{\beta} = ({\bf X}^{\sf T}{\bf X})^{-1}{\bf X}^{\sf T}{\bf y}\)` ** is a linear combination of ** `\(\bf y\)` **, with ** `\({\bf c} = 0\)` ** and ** `\({\bf D} = ({\bf X}^{\sf T}{\bf X})^{-1}{\bf X}^{\sf T}\)` **, then -- - ** `\(E[\hat{\beta}]={\bf D} E[y] = {\bf D}{\bf X}\beta= ({\bf X}^{\sf T}{\bf X})^{-1}{\bf X}^{\sf T}{\bf X}\beta = \beta\)` ** -- - ** `\(var(\hat{\beta})= {\bf D} \sigma^{2}{\bf D}^{\sf T}=({\bf X}^{\sf T}{\bf X})^{-1}{\bf X}^{\sf T}\sigma^{2}[({\bf X}^{\sf T}{\bf X})^{-1}{\bf X}^{\sf T}]^{\sf T} = \sigma^{2}({\bf X}^{\sf T}{\bf X})^{-1}{\bf X}^{\sf T}{\bf X}({\bf X}^{\sf T}{\bf X})^{-1}=\sigma^{2}({\bf X}^{\sf T}{\bf X})^{-1}\)` ** --- class: animated fadeIn ### Advertising data: multiple linear regression At this stage, no pre-processing, no test set evaluation, just fit the regression model on the whole data set ** pre-processing: define a `\(\texttt{recipe}\)` ** ```r adv_recipe = recipe(sales~., data=adv_data) ``` ** model specification define a `\(\texttt{parsnip}\)` ** model ```r adv_model = linear_reg(mode="regression", engine="lm") ``` ** define the `\(\texttt{workflow}\)` ** ```r adv_wflow = workflow() %>% add_recipe(recipe=adv_recipe) %>% add_model(adv_model) ``` ** fit the model** ```r adv_fit = adv_wflow %>% fit(data=adv_data) ``` --- class: animated fadeIn ### Advertising data: back to single models .pull-left[ <table class="table" style="font-size: 10px; margin-left: auto; margin-right: auto;"> <thead> <tr> <th style="text-align:left;"> medium </th> <th style="text-align:left;"> term </th> <th style="text-align:right;"> estimate </th> <th style="text-align:right;"> std.error </th> <th style="text-align:right;"> statistic </th> <th style="text-align:right;"> p.value </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;background-color: #D9FDEC !important;"> newspaper </td> <td style="text-align:left;background-color: #D9FDEC !important;"> (Intercept) </td> <td style="text-align:right;background-color: #D9FDEC !important;"> 12.3514 </td> <td style="text-align:right;background-color: #D9FDEC !important;"> 0.6214 </td> <td style="text-align:right;background-color: #D9FDEC !important;"> 19.8761 </td> <td style="text-align:right;background-color: #D9FDEC !important;"> 0.0000 </td> </tr> <tr> <td style="text-align:left;background-color: #D9FDEC !important;"> newspaper </td> <td style="text-align:left;background-color: #D9FDEC !important;"> budget </td> <td style="text-align:right;background-color: #D9FDEC !important;"> 0.0547 </td> <td style="text-align:right;background-color: #D9FDEC !important;"> 0.0166 </td> <td style="text-align:right;background-color: #D9FDEC !important;"> 3.2996 </td> <td style="text-align:right;background-color: #D9FDEC !important;"> 0.0011 </td> </tr> <tr> <td style="text-align:left;background-color: #CAF6FC !important;"> radio </td> <td style="text-align:left;background-color: #CAF6FC !important;"> (Intercept) </td> <td style="text-align:right;background-color: #CAF6FC !important;"> 9.3116 </td> <td style="text-align:right;background-color: #CAF6FC !important;"> 0.5629 </td> <td style="text-align:right;background-color: #CAF6FC !important;"> 16.5422 </td> <td style="text-align:right;background-color: #CAF6FC !important;"> 0.0000 </td> </tr> <tr> <td style="text-align:left;background-color: #CAF6FC !important;"> radio </td> <td style="text-align:left;background-color: #CAF6FC !important;"> budget </td> <td style="text-align:right;background-color: #CAF6FC !important;"> 0.2025 </td> <td style="text-align:right;background-color: #CAF6FC !important;"> 0.0204 </td> <td style="text-align:right;background-color: #CAF6FC !important;"> 9.9208 </td> <td style="text-align:right;background-color: #CAF6FC !important;"> 0.0000 </td> </tr> <tr> <td style="text-align:left;background-color: #FDB5BA !important;"> TV </td> <td style="text-align:left;background-color: #FDB5BA !important;"> (Intercept) </td> <td style="text-align:right;background-color: #FDB5BA !important;"> 7.0326 </td> <td style="text-align:right;background-color: #FDB5BA !important;"> 0.4578 </td> <td style="text-align:right;background-color: #FDB5BA !important;"> 15.3603 </td> <td style="text-align:right;background-color: #FDB5BA !important;"> 0.0000 </td> </tr> <tr> <td style="text-align:left;background-color: #FDB5BA !important;"> TV </td> <td style="text-align:left;background-color: #FDB5BA !important;"> budget </td> <td style="text-align:right;background-color: #FDB5BA !important;"> 0.0475 </td> <td style="text-align:right;background-color: #FDB5BA !important;"> 0.0027 </td> <td style="text-align:right;background-color: #FDB5BA !important;"> 17.6676 </td> <td style="text-align:right;background-color: #FDB5BA !important;"> 0.0000 </td> </tr> </tbody> </table> ] .pull-right[ <img src="Linear-Regression-part_1_files/figure-html/unnamed-chunk-5-1.png" width="75%" style="display: block; margin: auto;" /> ] --- class: animated fadeIn ### Advertising data: single vs multiple regression .pull-left[ ```r avd_lm_models_nest %>% unnest(model_params) %>% select(medium,term:p.value) %>% filter(term!="(Intercept)") %>% arrange(medium) %>% kbl(digits = 4) %>% kable_styling(font_size = 12) %>% column_spec(c(3,6),bold=TRUE) %>% row_spec(1,background = "#D9FDEC") %>% row_spec(2,background = "#CAF6FC") %>% row_spec(3,background = "#FDB5BA") ``` <table class="table" style="font-size: 12px; margin-left: auto; margin-right: auto;"> <thead> <tr> <th style="text-align:left;"> medium </th> <th style="text-align:left;"> term </th> <th style="text-align:right;"> estimate </th> <th style="text-align:right;"> std.error </th> <th style="text-align:right;"> statistic </th> <th style="text-align:right;"> p.value </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;background-color: #D9FDEC !important;"> newspaper </td> <td style="text-align:left;background-color: #D9FDEC !important;"> budget </td> <td style="text-align:right;font-weight: bold;background-color: #D9FDEC !important;"> 0.0547 </td> <td style="text-align:right;background-color: #D9FDEC !important;"> 0.0166 </td> <td style="text-align:right;background-color: #D9FDEC !important;"> 3.2996 </td> <td style="text-align:right;font-weight: bold;background-color: #D9FDEC !important;"> 0.0011 </td> </tr> <tr> <td style="text-align:left;background-color: #CAF6FC !important;"> radio </td> <td style="text-align:left;background-color: #CAF6FC !important;"> budget </td> <td style="text-align:right;font-weight: bold;background-color: #CAF6FC !important;"> 0.2025 </td> <td style="text-align:right;background-color: #CAF6FC !important;"> 0.0204 </td> <td style="text-align:right;background-color: #CAF6FC !important;"> 9.9208 </td> <td style="text-align:right;font-weight: bold;background-color: #CAF6FC !important;"> 0.0000 </td> </tr> <tr> <td style="text-align:left;background-color: #FDB5BA !important;"> TV </td> <td style="text-align:left;background-color: #FDB5BA !important;"> budget </td> <td style="text-align:right;font-weight: bold;background-color: #FDB5BA !important;"> 0.0475 </td> <td style="text-align:right;background-color: #FDB5BA !important;"> 0.0027 </td> <td style="text-align:right;background-color: #FDB5BA !important;"> 17.6676 </td> <td style="text-align:right;font-weight: bold;background-color: #FDB5BA !important;"> 0.0000 </td> </tr> </tbody> </table> ] .pull-right[ ```r adv_fit %>% tidy() %>% filter(term!="(Intercept)") %>% arrange(term) %>% kbl(digits = 4) %>% kable_styling(font_size = 12) %>% column_spec(c(2,5),bold=TRUE) %>% row_spec(1,background = "#D9FDEC") %>% row_spec(2,background = "#CAF6FC") %>% row_spec(3,background = "#FDB5BA") ``` <table class="table" style="font-size: 12px; margin-left: auto; margin-right: auto;"> <thead> <tr> <th style="text-align:left;"> term </th> <th style="text-align:right;"> estimate </th> <th style="text-align:right;"> std.error </th> <th style="text-align:right;"> statistic </th> <th style="text-align:right;"> p.value </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;background-color: #D9FDEC !important;"> newspaper </td> <td style="text-align:right;font-weight: bold;background-color: #D9FDEC !important;"> -0.0010 </td> <td style="text-align:right;background-color: #D9FDEC !important;"> 0.0059 </td> <td style="text-align:right;background-color: #D9FDEC !important;"> -0.1767 </td> <td style="text-align:right;font-weight: bold;background-color: #D9FDEC !important;"> 0.8599 </td> </tr> <tr> <td style="text-align:left;background-color: #CAF6FC !important;"> radio </td> <td style="text-align:right;font-weight: bold;background-color: #CAF6FC !important;"> 0.1885 </td> <td style="text-align:right;background-color: #CAF6FC !important;"> 0.0086 </td> <td style="text-align:right;background-color: #CAF6FC !important;"> 21.8935 </td> <td style="text-align:right;font-weight: bold;background-color: #CAF6FC !important;"> 0.0000 </td> </tr> <tr> <td style="text-align:left;background-color: #FDB5BA !important;"> TV </td> <td style="text-align:right;font-weight: bold;background-color: #FDB5BA !important;"> 0.0458 </td> <td style="text-align:right;background-color: #FDB5BA !important;"> 0.0014 </td> <td style="text-align:right;background-color: #FDB5BA !important;"> 32.8086 </td> <td style="text-align:right;font-weight: bold;background-color: #FDB5BA !important;"> 0.0000 </td> </tr> </tbody> </table> ] -- .center[** why these (apparently) contradictory results? **] --- class: animated fadeIn ### Advertising data: single vs multiple regression ```r library("corrr") adv_data %>% correlate() %>% network_plot() ``` <img src="Linear-Regression-part_1_files/figure-html/unnamed-chunk-6-1.png" width="30%" style="display: block; margin: auto;" /> **Note**: - ** `\(\texttt{newspaper}\)` ** is correlated with ** `\(\texttt{radio}\)` **, the latter having a significative effect on ** `\(\texttt{radio}\)` ** in the multiple regression model -- - that is why, in the single model, ** `\(\texttt{newspaper}\)` ** has a significant effect ** `\(\texttt{sales}\)` ** (** `\(\texttt{radio}\)` ** is ignored) --- class: animated fadeIn center middle inverse # regression by successive orthogonalisations --- class: animated fadeIn center middle inverse ### computing multiple regression coefficients ### by means of single regressions --- class: animated fadeIn ### Algebraic formalization of single regression Consider the data to be centered ** `\(\bar{y}=\bar{x}=0\)` **: this means that ** `\(\beta_{0}=0\)` ** (no intercept model), ** `\({\bf y}=\beta_{1}{\bf x}+\epsilon\)` ** -- The ** `\(\hat{\beta}_{1}\)` ** estimator ** `$$\hat{\beta}_{1}=\frac{\sum_{i}^{n}{x_{i}y_{i}}}{\sum_{i}^{n}{x_{i}^{2}}}=\frac{{\bf x}^{\sf T}{\bf y}}{{\bf x}^{\sf T}{\bf x}}=\frac{\langle {\bf x},{\bf y}\rangle}{\langle {\bf x},{\bf x}\rangle}$$` ** `\({\langle {\bf a},{\bf b}\rangle}\)` is the inner product `\({\bf a}^{\sf T}{\bf b}\)`. -- Consider the two predictors model ** `\(y = \beta_{1}X_{1}+\beta_{2}X_{2}+\epsilon\)` ** - **Note** if the two predictors are such that `\({\bf x}_{1}^{\sf T}{\bf x}_{2} = 0\)` then ** `\(\hat{\beta}_{1}\)` ** and ** `\(\hat{\beta}_{2}\)` ** can be computed by - fitting the multiple regression ** `\(y = \beta_{1}X_{1}+\beta_{2}X_{2}+\epsilon\)` ** - fitting the single regressions ** `\(y = \beta_{1}X_{1}+\epsilon\)` ** and ** `\(y = \beta_{2}X_{2}+\epsilon\)` ** --- class: animated fadeIn ### the orthogonal predictors case Check the previous claim, given the predictors matrix `$${\bf X}=\left[{ \begin{bmatrix} \\ \\ {\bf x}_{1} \\ \\ \\ \end{bmatrix} \begin{bmatrix} \\ \\ {\bf x}_{2} \\ \\ \\ \end{bmatrix}} \right]$$` and that ** `$${\bf \hat{\beta}}=\left({\bf X}^{\sf T}{\bf X}\right)^{-1}{\bf X}^{\sf T}{\bf y} \rightarrow \left({\bf X}^{\sf T}{\bf X}\right)\hat{\beta} - {\bf X}^{\sf T}{\bf y}={\bf X}^{\sf T}\left({\bf X}\hat{\beta} - {\bf y}\right)=0$$` ** --- class: animated fadeIn ### the non-orthogonal predictors case In real data, it is impossible to have all the predictors pair-wise orthogonal. -- - single and multiple regression coefficients estimates will differ -- - multiple regression coefficients estimates can still be obtained via single regressions via... -- .center[ **successive orthogonalizations** ] --- class: animated fadeIn ### Regression via successive orthogonalizations Consider the four predictors model (with no intercept) ** `$$y=\beta_{1}X_{1}+\beta_{2}X_{2}+\beta_{3}X_{3}+\beta_{4}X_{4}+\epsilon$$` ** -- ** Compute the value for `\(\hat{\beta}_{4}\)` using successive orthogonalizations ** set ** `\({\bf z}_{1}={\bf x}_{1}\)` ** .my-pull-left[ ** step 1 ** ] .my-pull-right[ > - fit ** `\({\bf x}_{2}=\beta_{2|1}{\bf z}_{1}+\epsilon\)` ** > > - compute ** `\(\hat{\beta}_{2|1}=\frac{\langle {\bf z}_{1},{\bf x}_{2}\rangle}{\langle {\bf z}_{1},{\bf z}_{1}\rangle}\)` ** > > - compute ** `\({\bf z}_{2}=x_{2}-\hat{\beta}_{2|1}{\bf z}_{1}\)` ** ] --- class: animated fadeIn ### Regression via successive orthogonalizations Consider the four predictors model (with no intercept) ** `$$y=\beta_{1}X_{1}+\beta_{2}X_{2}+\beta_{3}X_{3}+\beta_{4}X_{4}+\epsilon$$` ** ** Compute the value for `\(\hat{\beta}_{4}\)` using successive orthogonalizations ** .my-pull-left[ ** step 2 ** ] .my-pull-right[ > - fit ** `\({\bf x}_{3}=\beta_{3|1}{\bf z}_{1}+\epsilon\)` ** and ** `\({\bf x}_{3}=\beta_{3|2}{\bf z}_{2}+\epsilon\)` ** > > - compute ** `\(\hat{\beta}_{3|1}=\frac{\langle {\bf z}_{1},{\bf x}_{3}\rangle}{\langle {\bf z}_{1},{\bf z}_{1}\rangle}\)` ** and ** `\(\hat{\beta}_{3|2}=\frac{\langle {\bf z}_{2},{\bf x}_{3}\rangle}{\langle {\bf z}_{2},{\bf z}_{2}\rangle}\)` ** > > - compute ** `\({\bf z}_{3}=x_{3}-\hat{\beta}_{3|1}{\bf z}_{1}-\hat{\beta}_{3|2}{\bf z}_{2}\)` ** ] --- class: animated fadeIn ### Regression via successive orthogonalizations Consider the four predictors model (with no intercept) ** `$$y=\beta_{1}X_{1}+\beta_{2}X_{2}+\beta_{3}X_{3}+\beta_{4}X_{4}+\epsilon$$` ** ** Compute the value for `\(\hat{\beta}_{4}\)` using successive orthogonalizations ** .my-pull-left[ ** step 3 ** ] .my-pull-right[ > - fit ** `\({\bf x}_{4}=\beta_{4|1}{\bf z}_{1}+\epsilon\)` **,** `\({\bf x}_{4}=\beta_{4|2}{\bf z}_{2}+\epsilon\)` ** and ** `\({\bf x}_{4}=\beta_{4|3}{\bf z}_{3}+\epsilon\)` ** > > - compute ** `\(\hat{\beta}_{4|1}=\frac{\langle {\bf z}_{1},{\bf x}_{4}\rangle}{\langle {\bf z}_{1},{\bf z}_{1}\rangle}\)` **, ** `\(\hat{\beta}_{4|2}=\frac{\langle {\bf z}_{2},{\bf x}_{4}\rangle}{\langle {\bf z}_{2},{\bf z}_{2}\rangle}\)` ** and ** `\(\hat{\beta}_{4|3}=\frac{\langle {\bf z}_{3},{\bf x}_{4}\rangle}{\langle {\bf z}_{3},{\bf z}_{3}\rangle}\)` ** > > - ** compute `\({\bf z}_{4}=x_{4}-\hat{\beta}_{4|1}{\bf z}_{1}-\hat{\beta}_{4|2}{\bf z}_{2}-\hat{\beta}_{4|3}{\bf z}_{3}\)` ** ] --- class: animated fadeIn ### Regression via successive orthogonalizations Consider the four predictors model (with no intercept) ** `$$y=\beta_{1}X_{1}+\beta_{2}X_{2}+\beta_{3}X_{3}+\beta_{4}X_{4}+\epsilon$$` ** ** Compute the value for `\(\hat{\beta}_{4}\)` using successive orthogonalizations ** .my-pull-left[ ** step 4 ** ] .my-pull-right[ > - fit ** `\({\bf y}=\beta_{4}{\bf z}_{4}+\epsilon\)` ** > > - compute ** `\(\hat{\beta}_{4}=\frac{\langle {\bf z}_{4},{\bf y}\rangle}{\langle {\bf z}_{4},{\bf z}_{4}\rangle}\)` ** ] --- class: animated fadeIn ### Regression via successive orthogonalizations: a further look - The residuals vector ** `\({\bf e}=\left({\bf y}-{\bf \hat{y}}\right)\)` ** is orthogonal to the predictor ** `\({\bf x}\)` **, that is ** `\(\left({\bf y}-{\bf \hat{y}}\right)^{\sf T}{\bf x}=0\)` ** `$$\begin{split} ({\bf y}-{\bf \hat{y}})^{\sf T}{\bf x}&={\bf y}^{\sf T}{\bf x}-{\bf \hat{y}}^{\sf T}{\bf x} \\ &={\bf y}^{\sf T}{\bf x} - \underbrace{({\bf x} ({\bf x}^{\sf T}{\bf x})^{-1}{\bf x}^{\sf T}{\bf y})^{\sf T}}_{\bf \hat{y}^{\sf T}}{\bf x}= \\ &={\bf y}^{\sf T}{\bf x}- {\bf y}^{\sf T}{\bf x} ({\bf x}^{\sf T}{\bf x})^{-1}{\bf x}^{\sf T}{\bf x}= {\bf y}^{\sf T}{\bf x}- {\bf y}^{\sf T}{\bf x}=0 \end{split}$$` -- - recall that ** `\({\bf z}_{2}={\bf x}_{2}-\beta_{2|1}{\bf z}_{1}\)` **, the ** `\({\bf z}_{2}\)` ** is orthogonal to ** `\({\bf z}_{1}\)` **, and since ** `\({\bf z}_{1}={\bf x}_{1}\)` **, ** `\({\bf z}_{2}\)` ** is orthogonal to ** `\({\bf x}_{1}\)` **, too. -- - ** `\({\bf z}_{2}\)` ** is ** `\({\bf x}_{2}\)` ** _adjusted_ to be orthogonal to `\({\bf z}_{1}\)` ( `\({\bf x}_{1}\)` ). -- - ** `\({\bf z}_{3}\)` ** is ** `\({\bf x}_{3}\)` ** _adjusted_ to be orthogonal to `\({\bf z}_{2}\)` and to `\({\bf z}_{1}\)`. - ** `\({\bf z}_{4}\)` ** is ** `\({\bf x}_{4}\)` ** _adjusted_ to be orthogonal to `\({\bf z}_{3}\)`, `\({\bf z}_{2}\)` and to `\({\bf z}_{1}\)`. ** the multiple regression-based `\(\hat{\beta}_{j}\)` can be obtained via the single regression `\({\bf y}=\hat{\beta}_{j}{\bf z}_{j}+\epsilon\)` ** - ** `\({\bf z}_{j}\)` ** is ** `\({\bf x}_{j}\)` ** _adjusted_ wrt to the other predictors. --- class: animated fadeIn ### Regression via successive orthogonalizations: some considerations ** the multiple regression-based `\(\hat{\beta}_{j}\)` can be obtained via the single regression `\({\bf y}=\hat{\beta}_{j}{\bf z}_{j}+\epsilon\)` ** - If the predictors are pair-wise orthogonal, the single coefficients of the predictor `\(j\)` on the residual `\(i\)`, `\(\hat{\beta}_{j|i}=0\)` then `\({\bf z}_{j}={\bf x}_{j}, j= 1,\ldots,p\)`. -- - So ** `\(\hat{\beta}_{j}=\frac{\langle {\bf z}_{j}, {\bf y}\rangle}{\langle {\bf z}_{j}, {\bf z}_{j}\rangle}=\frac{\langle {\bf x}_{j}, {\bf y}\rangle}{\langle {\bf x}_{j}, {\bf x}_{j}\rangle}\)` **, just a single regression of `\({\bf y}\)` on `\({\bf x}_{j}\)` -- - if ** `\({\bf x}_{j}\)` ** is correlated with one or more of the other predictors, the squared norm ** `\(\|{\bf z}_{j}\|^{2}\approx 0\)` ** - this makes sense, since the predictor in question won't explain any new information on `\({\bf y}\)` -- - as ** `\(\|{\bf z}_{j}\|^{2} = \langle {\bf z}_{j}, {\bf z}_{j}\rangle\approx 0\)` **, the variability of the OLS estimator explodes, since ** `\(SE(\hat{\beta}_{j})=\frac{\sigma}{\sqrt{\|{\bf z}_{j}\|^{2}}}\)` **