Linear Regression (part 1)

class: center, middle, inverse, title-slide

# Linear Regression (part 1)
## with tidymodels
### Statistical Learning
### Alfonso Iodice D'Enza

---

class: animated fadeIn center middle inverse

# prologue...

---
class: animated fadeIn 
### supervised learning flow

.my-pull-left[
**pre-processing**
]
.my-pull-right[
> - split your observations in **training**, **validation** (or, cross-validate) and **test**
>
> - transform the predictors properly (** features engineering **) 
]

--

.my-pull-left[
**model-spec**
]
.my-pull-right[
> - **specify** the model to fit
]

--

.my-pull-left[
**tuning**
]
.my-pull-right[
> - select a reasonable grid of model **hyperparameter(s)** values to choose from
> 
> - for each combination of hyperparameters

>   - **fit** the model  on training observations
>   - compute appropriate **metrics** on evaluation observations

> 
> - pick-up the **best** hyperparamters combination

]

--

.my-pull-left[
**final evaluation and fit**
]
.my-pull-right[
> - compute the metric for the **tuned model** on the **test** set (observations never used yet)
> 
> - obtain the **final fit** for the model on all the available observations  
]

---
class: animated fadeIn 
### the tidymodels metapackage

** All the core packages in the tidymodels refer to one step of a supervised learning flow **

<center>
<img src="./figures/tidymodels.png" alt="tidymodels logo" height="300px" />
</center>

### For all things tidymodels check [tidymodels.org](https://www.tidymodels.org)!
---
class: animated fadeIn 
### the tidymodels core

** All the core packages in the tidymodels refer to one step of a supervised learning flow **

<center>
<img src="./figures/tidymodels_all.jpeg" alt="all" height="500px" />
</center>

---
class: animated fadeIn 
### the tidymodels core

the ** `\(\texttt{rsample}\)` ** package provides tools for data splitting and resampling

<center>
<img src="./figures/tidymodels_rsample.jpeg" alt="rsample" height="500px" />
</center>

---
class: animated fadeIn 
### the tidymodels core

the ** `\(\texttt{recipes}\)` ** package provides tools for data pre-processing and feature engineering

<center>
<img src="./figures/tidymodels_recipes.jpeg" alt="recipes" height="500px" />
</center>

---
class: animated fadeIn 
### the tidymodels core

the ** `\(\texttt{parsnip}\)` ** package provides a unified interface to recall several models available in R

<center>
<img src="./figures/tidymodels_parsnip.jpeg" alt="parsnip" height="500px" />
</center>

---
class: animated fadeIn 
### the tidymodels core

the ** `\(\texttt{workflows}\)` ** package  combines together pre-processing, modeling and post-processing steps

<center>
<img src="./figures/tidymodels_workflows.jpeg" alt="workflows" height="500px" />
</center>

---
class: animated fadeIn 
### the tidymodels core

the ** `\(\texttt{yardstick}\)` ** package  provides several performance metrics

<center>
<img src="./figures/tidymodels_yardstick.jpeg" alt="yardstick" height="500px" />
</center>

---
class: animated fadeIn 
### the tidymodels core

the ** `\(\texttt{dials}\)` ** package provides tools to define hyperparameters value grids

<center>
<img src="./figures/tidymodels_dials.jpeg" alt="dials" height="500px" />
</center>

---
class: animated fadeIn 
### the tidymodels core

the ** `\(\texttt{tune}\)` ** package remarkably simplifies the hyperparameters optimization implementation

<center>
<img src="./figures/tidymodels_tune.jpeg" alt="tune" height="500px" />
</center>

---
class: animated fadeIn 
### the tidymodels core

the ** `\(\texttt{broom}\)` ** package provides utility functions to tidify model output

<center>
<img src="./figures/tidymodels_broom.jpeg" alt="tune" height="500px" />
</center>

---
class: animated fadeIn center middle inverse

# ...end of prologue

---
class: animated fadeIn 
### Advertising data

```r
adv_data = read_csv(file="./data/Advertising.csv") %>% select(-1)
adv_data %>% slice_sample(n = 8) %>% 
  kbl() %>% kable_styling(font_size=10)
```

<table class="table" style="font-size: 10px; margin-left: auto; margin-right: auto;">
 <thead>
  <tr>
   <th style="text-align:right;"> TV </th>
   <th style="text-align:right;"> radio </th>
   <th style="text-align:right;"> newspaper </th>
   <th style="text-align:right;"> sales </th>
  </tr>
 </thead>
<tbody>
  <tr>
   <td style="text-align:right;"> 25.1 </td>
   <td style="text-align:right;"> 25.7 </td>
   <td style="text-align:right;"> 43.3 </td>
   <td style="text-align:right;"> 8.5 </td>
  </tr>
  <tr>
   <td style="text-align:right;"> 198.9 </td>
   <td style="text-align:right;"> 49.4 </td>
   <td style="text-align:right;"> 60.0 </td>
   <td style="text-align:right;"> 23.7 </td>
  </tr>
  <tr>
   <td style="text-align:right;"> 184.9 </td>
   <td style="text-align:right;"> 43.9 </td>
   <td style="text-align:right;"> 1.7 </td>
   <td style="text-align:right;"> 20.7 </td>
  </tr>
  <tr>
   <td style="text-align:right;"> 69.2 </td>
   <td style="text-align:right;"> 20.5 </td>
   <td style="text-align:right;"> 18.3 </td>
   <td style="text-align:right;"> 11.3 </td>
  </tr>
  <tr>
   <td style="text-align:right;"> 163.5 </td>
   <td style="text-align:right;"> 36.8 </td>
   <td style="text-align:right;"> 7.4 </td>
   <td style="text-align:right;"> 18.0 </td>
  </tr>
  <tr>
   <td style="text-align:right;"> 234.5 </td>
   <td style="text-align:right;"> 3.4 </td>
   <td style="text-align:right;"> 84.8 </td>
   <td style="text-align:right;"> 11.9 </td>
  </tr>
  <tr>
   <td style="text-align:right;"> 18.8 </td>
   <td style="text-align:right;"> 21.7 </td>
   <td style="text-align:right;"> 50.4 </td>
   <td style="text-align:right;"> 7.0 </td>
  </tr>
  <tr>
   <td style="text-align:right;"> 131.1 </td>
   <td style="text-align:right;"> 42.8 </td>
   <td style="text-align:right;"> 28.9 </td>
   <td style="text-align:right;"> 18.0 </td>
  </tr>
</tbody>
</table>

- ** sales ** is the response, indicating the level of sales in a specific market
- ** TV **, ** Radio ** and ** Newspaper ** are the predictors, indicating the advertising budget spent on the corresponding media

---
class: animated fadeIn

### Advertising data

```r
adv_data_tidy = adv_data %>% pivot_longer(names_to="medium",values_to="budget",cols = 1:3) 
adv_data_tidy %>% slice_sample(n = 8) %>% 
  kbl() %>% kable_styling(font_size=10)
```

<table class="table" style="font-size: 10px; margin-left: auto; margin-right: auto;">
 <thead>
  <tr>
   <th style="text-align:right;"> sales </th>
   <th style="text-align:left;"> medium </th>
   <th style="text-align:right;"> budget </th>
  </tr>
 </thead>
<tbody>
  <tr>
   <td style="text-align:right;"> 15.2 </td>
   <td style="text-align:left;"> newspaper </td>
   <td style="text-align:right;"> 65.7 </td>
  </tr>
  <tr>
   <td style="text-align:right;"> 25.5 </td>
   <td style="text-align:left;"> radio </td>
   <td style="text-align:right;"> 42.0 </td>
  </tr>
  <tr>
   <td style="text-align:right;"> 18.9 </td>
   <td style="text-align:left;"> newspaper </td>
   <td style="text-align:right;"> 22.9 </td>
  </tr>
  <tr>
   <td style="text-align:right;"> 14.8 </td>
   <td style="text-align:left;"> TV </td>
   <td style="text-align:right;"> 280.2 </td>
  </tr>
  <tr>
   <td style="text-align:right;"> 10.8 </td>
   <td style="text-align:left;"> newspaper </td>
   <td style="text-align:right;"> 5.8 </td>
  </tr>
  <tr>
   <td style="text-align:right;"> 15.9 </td>
   <td style="text-align:left;"> radio </td>
   <td style="text-align:right;"> 16.7 </td>
  </tr>
  <tr>
   <td style="text-align:right;"> 18.9 </td>
   <td style="text-align:left;"> radio </td>
   <td style="text-align:right;"> 27.5 </td>
  </tr>
  <tr>
   <td style="text-align:right;"> 14.6 </td>
   <td style="text-align:left;"> TV </td>
   <td style="text-align:right;"> 78.2 </td>
  </tr>
</tbody>
</table>

---
class: animated fadeIn 
### Advertising data

```r
adv_data_tidy %>% 
  ggplot(aes(x = budget, y = sales)) + theme_minimal() + facet_wrap(~medium,scales = "free") +
  geom_point(color="indianred",alpha=.5,size=3)
```

<img src="Linear-Regression-part_1_files/figure-html/advert_plot-1.png" width="50%" style="display: block; margin: auto;" />

--
**Note**: the budget spent on TV is up to 300, less so for Radio and Newspaper

---
class: animated fadeIn 
### Advertising data: single regressions

We can be **naive** and regress ** `\(\texttt{sales}\)` **  on ** `\(\texttt{tv}\)` **,  ** `\(\texttt{newspaper}\)` ** and  ** `\(\texttt{radio}\)` **, separately.

---
class: animated fadeIn

### Advertising data: single regressions

```r
*avd_lm_models_nest = adv_data_tidy %>%
*                 group_by(medium) %>%
*                 group_nest(.key = "datasets") %>%
                  mutate(
                    model_output = map(.x=datasets,~lm(sales~budget,data=.x)),
                         model_params = map(.x=model_output, ~tidy(.x)),
                         model_metrics = map(.x=model_output, ~glance(.x)),
                         model_fitted = map(.x=model_output, ~augment(.x))
                  )
```

```
## # A tibble: 3 × 2
##   medium              datasets
##   <chr>     <list<tibble[,2]>>
## 1 newspaper          [200 × 2]
## 2 radio              [200 × 2]
## 3 TV                 [200 × 2]
```

---
class: animated fadeIn

### Advertising data: single regressions

```r
avd_lm_models_nest = adv_data_tidy %>% 
                  group_by(medium) %>% 
                  group_nest(.key = "datasets") %>% 
*                 mutate(
*                   model_output = map(.x=datasets,~lm(sales~budget,data=.x)),
                         model_params = map(.x=model_output, ~tidy(.x)),
                         model_metrics = map(.x=model_output, ~glance(.x)),
                         model_fitted = map(.x=model_output, ~augment(.x))
*                 )
```

```
## # A tibble: 3 × 3
##   medium              datasets model_output
##   <chr>     <list<tibble[,2]>> <list>      
## 1 newspaper          [200 × 2] <lm>        
## 2 radio              [200 × 2] <lm>        
## 3 TV                 [200 × 2] <lm>
```

---
class: animated fadeIn

### Advertising data: single regressions

```r
avd_lm_models_nest = adv_data_tidy %>% 
                  group_by(medium) %>% 
                  group_nest(.key = "datasets") %>% 
*                 mutate(
*                   model_output = map(.x=datasets,~lm(sales~budget,data=.x)),
*                        model_params = map(.x=model_output, ~tidy(.x)),
                         model_metrics = map(.x=model_output, ~glance(.x)),
                         model_fitted = map(.x=model_output, ~augment(.x))
*                 )
```

```
## # A tibble: 3 × 4
##   medium              datasets model_output model_params    
##   <chr>     <list<tibble[,2]>> <list>       <list>          
## 1 newspaper          [200 × 2] <lm>         <tibble [2 × 5]>
## 2 radio              [200 × 2] <lm>         <tibble [2 × 5]>
## 3 TV                 [200 × 2] <lm>         <tibble [2 × 5]>
```

---
class: animated fadeIn

### Advertising data: single regressions

```r
avd_lm_models_nest = adv_data_tidy %>% 
                  group_by(medium) %>% 
                  group_nest(.key = "datasets") %>% 
*                 mutate(
*                   model_output = map(.x=datasets,~lm(sales~budget,data=.x)),
*                        model_params = map(.x=model_output, ~tidy(.x)),
*                        model_metrics = map(.x=model_output, ~glance(.x)),
                         model_fitted = map(.x=model_output, ~augment(.x))
*                 )
```

```
## # A tibble: 3 × 5
##   medium              datasets model_output model_params     model_metrics    
##   <chr>     <list<tibble[,2]>> <list>       <list>           <list>           
## 1 newspaper          [200 × 2] <lm>         <tibble [2 × 5]> <tibble [1 × 12]>
## 2 radio              [200 × 2] <lm>         <tibble [2 × 5]> <tibble [1 × 12]>
## 3 TV                 [200 × 2] <lm>         <tibble [2 × 5]> <tibble [1 × 12]>
```

---
class: animated fadeIn

### Advertising data: single regressions

```r
avd_lm_models_nest = adv_data_tidy %>% 
                  group_by(medium) %>% 
                  group_nest(.key = "datasets") %>% 
*                 mutate(
*                   model_output = map(.x=datasets,~lm(sales~budget,data=.x)),
*                        model_params = map(.x=model_output, ~tidy(.x)),
*                        model_metrics = map(.x=model_output, ~glance(.x)),
*                        model_fitted = map(.x=model_output, ~augment(.x))
*                 )
```

```
## # A tibble: 3 × 6
##   medium           datasets model_output model_params model_metrics model_fitted
##   <chr>     <list<tibble[,> <list>       <list>       <list>        <list>      
## 1 newspaper       [200 × 2] <lm>         <tibble>     <tibble>      <tibble>    
## 2 radio           [200 × 2] <lm>         <tibble>     <tibble>      <tibble>    
## 3 TV              [200 × 2] <lm>         <tibble>     <tibble>      <tibble>
```
--
This is a ** nested data structure ** with everything stored. The functions `\(\texttt{tidy}\)`, `\(\texttt{glance}\)` and 
`\(\texttt{augment}\)`  pull all the information off of the model output, and arrange it in a tidy way.

---
class: animated fadeIn 
### Advertising data: nested structure

The quantities nested in the tibble can be pulled out, or they can be *expanded* within the tibble itself, using 
`\(\texttt{unnest}\)`

```r
avd_lm_models_nest  %>% unnest(model_params) 
```

```
## # A tibble: 6 × 10
##   medium       datasets model_output term  estimate std.error statistic  p.value
##   <chr>     <list<tibb> <list>       <chr>    <dbl>     <dbl>     <dbl>    <dbl>
## 1 newspaper   [200 × 2] <lm>         (Int…  12.4      0.621       19.9  4.71e-49
## 2 newspaper   [200 × 2] <lm>         budg…   0.0547   0.0166       3.30 1.15e- 3
## 3 radio       [200 × 2] <lm>         (Int…   9.31     0.563       16.5  3.56e-39
## 4 radio       [200 × 2] <lm>         budg…   0.202    0.0204       9.92 4.35e-19
## 5 TV          [200 × 2] <lm>         (Int…   7.03     0.458       15.4  1.41e-35
## 6 TV          [200 × 2] <lm>         budg…   0.0475   0.00269     17.7  1.47e-42
## # … with 2 more variables: model_metrics <list>, model_fitted <list>
```

---
class: animated fadeIn 
### Advertising data: nested structure

The quantities nested in the tibble can be pulled out, or they can be *expanded* within the tibble itself, using 
`\(\texttt{unnest}\)`

```r
avd_lm_models_nest  %>% unnest(model_params) %>% 
  select(medium,term:p.value) %>%
  kbl(digits = 4) %>% kable_styling(font_size = 10) %>% 
  row_spec(1:2,background = "#D9FDEC") %>% 
  row_spec(3:4,background = "#CAF6FC") %>% 
  row_spec(5:6,background = "#FDB5BA")
```

<table class="table" style="font-size: 10px; margin-left: auto; margin-right: auto;">
 <thead>
  <tr>
   <th style="text-align:left;"> medium </th>
   <th style="text-align:left;"> term </th>
   <th style="text-align:right;"> estimate </th>
   <th style="text-align:right;"> std.error </th>
   <th style="text-align:right;"> statistic </th>
   <th style="text-align:right;"> p.value </th>
  </tr>
 </thead>
<tbody>
  <tr>
   <td style="text-align:left;background-color: #D9FDEC !important;"> newspaper </td>
   <td style="text-align:left;background-color: #D9FDEC !important;"> (Intercept) </td>
   <td style="text-align:right;background-color: #D9FDEC !important;"> 12.3514 </td>
   <td style="text-align:right;background-color: #D9FDEC !important;"> 0.6214 </td>
   <td style="text-align:right;background-color: #D9FDEC !important;"> 19.8761 </td>
   <td style="text-align:right;background-color: #D9FDEC !important;"> 0.0000 </td>
  </tr>
  <tr>
   <td style="text-align:left;background-color: #D9FDEC !important;"> newspaper </td>
   <td style="text-align:left;background-color: #D9FDEC !important;"> budget </td>
   <td style="text-align:right;background-color: #D9FDEC !important;"> 0.0547 </td>
   <td style="text-align:right;background-color: #D9FDEC !important;"> 0.0166 </td>
   <td style="text-align:right;background-color: #D9FDEC !important;"> 3.2996 </td>
   <td style="text-align:right;background-color: #D9FDEC !important;"> 0.0011 </td>
  </tr>
  <tr>
   <td style="text-align:left;background-color: #CAF6FC !important;"> radio </td>
   <td style="text-align:left;background-color: #CAF6FC !important;"> (Intercept) </td>
   <td style="text-align:right;background-color: #CAF6FC !important;"> 9.3116 </td>
   <td style="text-align:right;background-color: #CAF6FC !important;"> 0.5629 </td>
   <td style="text-align:right;background-color: #CAF6FC !important;"> 16.5422 </td>
   <td style="text-align:right;background-color: #CAF6FC !important;"> 0.0000 </td>
  </tr>
  <tr>
   <td style="text-align:left;background-color: #CAF6FC !important;"> radio </td>
   <td style="text-align:left;background-color: #CAF6FC !important;"> budget </td>
   <td style="text-align:right;background-color: #CAF6FC !important;"> 0.2025 </td>
   <td style="text-align:right;background-color: #CAF6FC !important;"> 0.0204 </td>
   <td style="text-align:right;background-color: #CAF6FC !important;"> 9.9208 </td>
   <td style="text-align:right;background-color: #CAF6FC !important;"> 0.0000 </td>
  </tr>
  <tr>
   <td style="text-align:left;background-color: #FDB5BA !important;"> TV </td>
   <td style="text-align:left;background-color: #FDB5BA !important;"> (Intercept) </td>
   <td style="text-align:right;background-color: #FDB5BA !important;"> 7.0326 </td>
   <td style="text-align:right;background-color: #FDB5BA !important;"> 0.4578 </td>
   <td style="text-align:right;background-color: #FDB5BA !important;"> 15.3603 </td>
   <td style="text-align:right;background-color: #FDB5BA !important;"> 0.0000 </td>
  </tr>
  <tr>
   <td style="text-align:left;background-color: #FDB5BA !important;"> TV </td>
   <td style="text-align:left;background-color: #FDB5BA !important;"> budget </td>
   <td style="text-align:right;background-color: #FDB5BA !important;"> 0.0475 </td>
   <td style="text-align:right;background-color: #FDB5BA !important;"> 0.0027 </td>
   <td style="text-align:right;background-color: #FDB5BA !important;"> 17.6676 </td>
   <td style="text-align:right;background-color: #FDB5BA !important;"> 0.0000 </td>
  </tr>
</tbody>
</table>

--

&nbsp;
&nbsp;

The effect of budget on sales differs significantly from 0, for all the considered media

---
class: animated fadeIn 
### Ordinary least squares

** `$$\min_{\hat{\beta}_{0},\hat{\beta}_{1}}: RSS=\sum_{i=1}^{n}{e_{i}^{2}}=\sum_{i=1}^{n}{(y_{i}-\hat{y}_{i})^{2}}=\sum_{i=1}^{n}{(y_{i}-\hat{\beta}_{0}-\hat{\beta}_{1}x_{i})^{2}}$$` **

--

### OLS estimator of `\(\beta_{0}\)`

`$$\begin{split}&\partial_{\hat{\beta}_{0}}\sum_{i=1}^{n}{(y_{i}-\hat{\beta}_{0}-\hat{\beta}_{1}x_{i})^{2}}=
-2\sum_{i=1}^{n}{(y_{i}-\hat{\beta}_{0}-\hat{\beta}_{1}x_{i})}=\\
&\sum_{i=1}^{n}{y_{i}}-n\hat{\beta}_{0}-\hat{\beta}_{1}\sum_{i=1}^{n}{x_{i}}=0\rightarrow
\hat{\beta}_{0}=\bar{y}-\hat{\beta}_{1}\bar{x}\end{split}$$`

---
class: animated fadeIn 
### Ordinary least squares

** `$$\min_{\hat{\beta}_{0},\hat{\beta}_{1}}\sum_{i=1}^{n}{e_{i}^{2}}=\sum_{i=1}^{n}{(y_{i}-\hat{y}_{i})^{2}}=\sum_{i=1}^{n}{(y_{i}-\hat{\beta}_{0}-\hat{\beta}_{1}x_{i})^{2}}$$` **

### OLS estimator of `\(\beta_{1}\)`

`$$\begin{split}&\partial_{\hat{\beta}_{1}}\sum_{i=1}^{n}{(y_{i}-\hat{\beta}_{0}-\hat{\beta}_{1}x_{i})^{2}}=
-2\sum_{i=1}^{n}{x_{i}(y_{i}-\hat{\beta}_{0}-\hat{\beta}_{1}x_{i})}= 
\sum_{i=1}^{n}{x_{i}y_{i}}-\hat{\beta}_{0}\sum_{i=1}^{n}{x_{i}}-\hat{\beta}_{1}\sum_{i=1}^{n}{x_{i}^{2}}=0\\
&\hat{\beta}_{1}\sum_{i=1}^{n}{x_{i}^{2}}=\sum_{i=1}^{n}{x_{i}y_{i}}-\sum_{i=1}^{n}{x_{i}}\left(\frac{\sum_{i=1}^{n}{y_{i}}}{n}-\hat{\beta}_{1}\frac{\sum_{i=1}^{n}{x_{i}}}{n}\right)\\
&\hat{\beta}_{1}\left(n\sum_{i=1}^{n}{x_{i}^{2}}-(\sum_{i=1}^{n}{x_{i}})^{2} \right)=
n\sum_{i=1}^{n}{x_{i}y_{i}}-\sum_{i=1}^{n}{x_{i}}\sum_{i=1}^{n}{y_{i}}\\
&\hat{\beta}_{1}=\frac{n\sum_{i=1}^{n}{x_{i}y_{i}}-\sum_{i=1}^{n}{x_{i}}\sum_{i=1}^{n}{y_{i}}}
{n\sum_{i=1}^{n}{x_{i}^{2}}-(\sum_{i=1}^{n}{x_{i}})^{2} }=\frac{\sigma_{xy}}{\sigma^{2}_{x}}
\end{split}$$`

---
class: animated fadeIn 
### Three regression lines

```r
*three_regs_plot=avd_lm_models_nest  %>% unnest(model_fitted) %>%
* select(medium,sales:.fitted) %>%
* ggplot(aes(x=budget,y=sales))+theme_minimal()+facet_grid(~medium,scales="free")+
  geom_point(alpha=.5,color = "indianred")+
  geom_smooth(method="lm")+
  geom_segment(aes(x=budget,xend=budget,y=.fitted,yend=sales),color="forestgreen",alpha=.25)
```
<img src="Linear-Regression-part_1_files/figure-html/unnamed-chunk-1-1.png" width="40%" style="display: block; margin: auto;" />

---
class: animated fadeIn 
### Three regression lines

```r
*three_regs_plot = avd_lm_models_nest  %>% unnest(model_fitted) %>%
* select(medium,sales:.fitted) %>%
* ggplot(aes(x=budget,y=sales))+theme_minimal()+facet_grid(~medium,scales="free")+
* geom_point(alpha=.5,color = "indianred")+
  geom_smooth(method="lm")+
  geom_segment(aes(x=budget,xend=budget,y=.fitted,yend=sales),color="forestgreen",alpha=.25)
```

<img src="Linear-Regression-part_1_files/figure-html/unnamed-chunk-2-1.png" width="40%" style="display: block; margin: auto;" />

---
class: animated fadeIn 
### Three regression lines

```r
*three_regs_plot=avd_lm_models_nest  %>% unnest(model_fitted) %>%
* select(medium,sales:.fitted) %>%
* ggplot(aes(x=budget,y=sales))+theme_minimal()+facet_grid(~medium,scales="free")+
* geom_point(alpha=.5,color = "indianred")+
* geom_smooth(method="lm")+
  geom_segment(aes(x=budget,xend=budget,y=.fitted,yend=sales),color="forestgreen",alpha=.25)
```

<img src="Linear-Regression-part_1_files/figure-html/unnamed-chunk-3-1.png" width="40%" style="display: block; margin: auto;" />

---
class: animated fadeIn 
### Three regression lines

```r
*three_regs_plot=avd_lm_models_nest  %>% unnest(model_fitted) %>%
* select(medium,sales:.fitted) %>%
* ggplot(aes(x=budget,y=sales))+theme_minimal()+facet_grid(~medium,scales="free")+
* geom_point(alpha=.5,color = "indianred")+
* geom_smooth(method="lm")+
* geom_segment(aes(x=budget,xend=budget,y=.fitted,yend=sales),color="forestgreen",alpha=.25)
```

<img src="Linear-Regression-part_1_files/figure-html/unnamed-chunk-4-1.png" width="40%" style="display: block; margin: auto;" />

---
class: animated fadeIn 
### Advertising data: nested structure

The quantities nested in the tibble can be pulled out, or they can be *expanded* within the tibble itself, using 
`\(\texttt{unnest}\)`

```r
avd_lm_models_nest  %>% unnest(model_metrics) %>% 
  select(medium,r.squared:df.residual) %>%
  kbl(digits = 4) %>% kable_styling(font_size = 10) %>% 
  row_spec(1,background = "#D9FDEC") %>% 
  row_spec(2,background = "#CAF6FC") %>% 
  row_spec(3,background = "#FDB5BA")
```

<table class="table" style="font-size: 10px; margin-left: auto; margin-right: auto;">
 <thead>
  <tr>
   <th style="text-align:left;"> medium </th>
   <th style="text-align:right;"> r.squared </th>
   <th style="text-align:right;"> adj.r.squared </th>
   <th style="text-align:right;"> sigma </th>
   <th style="text-align:right;"> statistic </th>
   <th style="text-align:right;"> p.value </th>
   <th style="text-align:right;"> df </th>
   <th style="text-align:right;"> logLik </th>
   <th style="text-align:right;"> AIC </th>
   <th style="text-align:right;"> BIC </th>
   <th style="text-align:right;"> deviance </th>
   <th style="text-align:right;"> df.residual </th>
  </tr>
 </thead>
<tbody>
  <tr>
   <td style="text-align:left;background-color: #D9FDEC !important;"> newspaper </td>
   <td style="text-align:right;background-color: #D9FDEC !important;"> 0.0521 </td>
   <td style="text-align:right;background-color: #D9FDEC !important;"> 0.0473 </td>
   <td style="text-align:right;background-color: #D9FDEC !important;"> 5.0925 </td>
   <td style="text-align:right;background-color: #D9FDEC !important;"> 10.8873 </td>
   <td style="text-align:right;background-color: #D9FDEC !important;"> 0.0011 </td>
   <td style="text-align:right;background-color: #D9FDEC !important;"> 1 </td>
   <td style="text-align:right;background-color: #D9FDEC !important;"> -608.3357 </td>
   <td style="text-align:right;background-color: #D9FDEC !important;"> 1222.671 </td>
   <td style="text-align:right;background-color: #D9FDEC !important;"> 1232.566 </td>
   <td style="text-align:right;background-color: #D9FDEC !important;"> 5134.805 </td>
   <td style="text-align:right;background-color: #D9FDEC !important;"> 198 </td>
  </tr>
  <tr>
   <td style="text-align:left;background-color: #CAF6FC !important;"> radio </td>
   <td style="text-align:right;background-color: #CAF6FC !important;"> 0.3320 </td>
   <td style="text-align:right;background-color: #CAF6FC !important;"> 0.3287 </td>
   <td style="text-align:right;background-color: #CAF6FC !important;"> 4.2749 </td>
   <td style="text-align:right;background-color: #CAF6FC !important;"> 98.4216 </td>
   <td style="text-align:right;background-color: #CAF6FC !important;"> 0.0000 </td>
   <td style="text-align:right;background-color: #CAF6FC !important;"> 1 </td>
   <td style="text-align:right;background-color: #CAF6FC !important;"> -573.3369 </td>
   <td style="text-align:right;background-color: #CAF6FC !important;"> 1152.674 </td>
   <td style="text-align:right;background-color: #CAF6FC !important;"> 1162.569 </td>
   <td style="text-align:right;background-color: #CAF6FC !important;"> 3618.479 </td>
   <td style="text-align:right;background-color: #CAF6FC !important;"> 198 </td>
  </tr>
  <tr>
   <td style="text-align:left;background-color: #FDB5BA !important;"> TV </td>
   <td style="text-align:right;background-color: #FDB5BA !important;"> 0.6119 </td>
   <td style="text-align:right;background-color: #FDB5BA !important;"> 0.6099 </td>
   <td style="text-align:right;background-color: #FDB5BA !important;"> 3.2587 </td>
   <td style="text-align:right;background-color: #FDB5BA !important;"> 312.1450 </td>
   <td style="text-align:right;background-color: #FDB5BA !important;"> 0.0000 </td>
   <td style="text-align:right;background-color: #FDB5BA !important;"> 1 </td>
   <td style="text-align:right;background-color: #FDB5BA !important;"> -519.0457 </td>
   <td style="text-align:right;background-color: #FDB5BA !important;"> 1044.091 </td>
   <td style="text-align:right;background-color: #FDB5BA !important;"> 1053.986 </td>
   <td style="text-align:right;background-color: #FDB5BA !important;"> 2102.531 </td>
   <td style="text-align:right;background-color: #FDB5BA !important;"> 198 </td>
  </tr>
</tbody>
</table>

---
class: animated fadeIn

### Linear regression: model assumptions

The linear regression model is

** `$$y_{i}=\beta_{0}+\beta_{1}x_{1}+\epsilon_{i}$$` **

where ** `\(\epsilon_{i}\)` ** is a random variable with expected value of **0**. 
For inference, more assumptions must me made:

- ** `\(\epsilon_{i}\sim N(0,\sigma^{2})\)` **, then the variance of the errors ** `\(\sigma^{2}\)` ** does not depend on ** `\(x_{i}\)` **.

- ** `\(Cov(\epsilon_{i},\epsilon_{i'})=0\)` **, for all `\(i\neq i'\)` and `\(i,i'=1,\ldots,n\)`.

- ** `\(x_{i}\)` ** non stochastic

--

It follows that ** `\(y_{i}\)` ** is a random variable such that

- ** `\(E[y_{i}]=\beta_{0}+\beta_{1}x_{1}\)` **
- ** `\(Var[y_{i}]=\sigma^{2}\)` **

---
class: animated fadeIn

### Linear regression: model assumptions

<center>
<img src="./figures/RegAssum.png" alt="RegAssum" height="500px" />
</center>
<center>
Statistics for Business and Economics (Anderson, Sweeney and Williams, (2011))
</center>

---
class: animated fadeIn 
### `\(\sigma^{2}\)` estimator

The  variance ** `\(\sigma^{2}\)` ** is assumed to be constant, but it is unknown, and it has to be estimated:

- since ** `\(y_{i}\sim N(\beta_{0}+\beta_{1}x_{i},\sigma^{2})\)` **, it follows that

** `$$\frac{y_{i}-\beta_{0}-\beta_{1}x_{i}}{\sigma}\sim N(0,1)$$` **

--

- furthermore, recall that  ** `\(\sum_{i=1}^{n}\left[N(0,1)\right]^{2} = \chi^{2}_{n}\)` **, then

** `$$\sum_{i=1}^{n}\left[N(0,1)\right]^{2}=\sum_{i=1}^{n}\left[\frac{y_{i}-\beta_{0}-\beta_{1}x_{i}}{\sigma}\right]^{2}=\chi^{2}_{n}$$` **

--

- replacing ** `\(\beta_{0}\)` ** and ** `\(\beta_{1}\)` ** with their estimators ** `\(\hat{\beta}_{0}\)` ** and ** `\(\hat{\beta}_{1}\)` **,
the previous becomes

** `$$\frac{\sum_{i=1}^{n}\left(y_{i}-\hat{\beta}_{0}-\hat{\beta}_{1}x_{i}\right)^{2}}{\sigma^{2}}=\frac{RSS}{\sigma^{2}}=\chi^{2}_{n-2}$$` ** two degree of freedom are lost as the two parameters are replaced by their estimators.

---
class: animated fadeIn 
### `\(\sigma^{2}\)` estimator

- Finally, since ** `\(E\left[\chi^{2}_{n-2}\right]=n-2\)` ** then

** `$$E\left[\chi^{2}_{n-2}\right]=E\left[\frac{RSS}{\sigma^{2}}\right]=n-2$$` **

- and because ** `\(\sigma^{2}\)` ** is constant, it can be pulled out from the expectation, and re-write

** `$$\frac{E\left[RSS\right]}{\sigma^{2}}=n-2$$` **

- it follows that
** `$$\sigma^{2}=E\left[\frac{RSS}{n-2}\right]$$` **
and ** `\(\frac{RSS}{n-2}\)` ** is an unbiased estimator of `\(\sigma^{2}\)`.

- ** `\(\sqrt{\frac{RSS}{n-2}}=RSE\)` **, the so-called **residual standard error**

---
class: animated fadeIn 
### `\(\hat{\beta}_{1}\)` as a linear combination of `\(y_{i}\)`

`$$\begin{split}
\hat{\beta}_{1}&= \frac{\sum_{i=1}^{n}{\left(y_{i}-\bar{y}\right)\left(x_{i}-\bar{x}\right)} }{\sum_{i=1}^{n}{\left(x_{i}-\bar{x}\right)^{2}}}
=\frac{\sum_{i=1}^{n}{\left[y_{i} \left(x_{i}-\bar{x}\right) -\bar{y}\left(x_{i}-\bar{x}\right)\right]} }{\sum_{i=1}^{n}{\left(x_{i}-\bar{x}\right)^{2}}}=\\
&=\frac{\sum_{i=1}^{n}{y_{i} \left(x_{i}-\bar{x}\right)} -\bar{y}\sum_{i=1}^{n}{\left(x_{i}-\bar{x}\right)}}{\sum_{i=1}^{n}{\left(x_{i}-\bar{x}\right)^{2}}}
= \sum_{i=1}^{n}{y_{i} \frac{\left(x_{i}-\bar{x}\right)}{\sum_{i=1}^{n}{\left(x_{i}-\bar{x}\right)^{2}}}}=
\sum_{i=1}^{n}{w_{i}y_{i}}\end{split}$$`

Given a linear combination `\(U_{i}=c+d V_{i}\)`, if `\(V_{i}\sim N(\mu_{v},\sigma^{2}_{v})\)`, then 
`\(U_{i}\sim N(c+d\mu_{v},d^{2}\sigma^{2}_{v})\)`.

Since `\(Y_{i}\sim N(\beta_{0}+\beta_{1}x_{i},\sigma^{2})\)`, and `\(\hat{\beta}_{1}=\sum_{i=1}^{n}{w_{i}y_{i}}\)` then `\(c=0\)` and `\(d=w_{i}\)`, then

** `$$\hat{\beta}_{1}\sim N(\sum_{i=1}^{n}{w_{i}(\beta_{0}+\beta_{1}x_{i})},\sum_{i=1}^{n}{w_{i}^{2}\sigma^{2}})$$` **

---
class: animated fadeIn

### Parameters of the `\(\hat{\beta}_{1}\)` distribution: `\(E\left[\hat{\beta}_{1}\right]=\beta_{1}\)`

The expectation of ** `\(\hat{\beta}_{1}\)` ** is

- ** `\(E\left[\hat{\beta}_{1}\right] = \sum_{i=1}^{n}{w_{i}(\beta_{0}+\beta_{1}x_{i})} = \beta_{0}\underbrace{\sum_{i=1}^{n}{w_{i}}}_{0}+\beta_{1}\sum_{i=1}^{n}{w_{i}x_{i}}=\beta_{1}\sum_{i=1}^{n}{\frac{x_{i}-\bar{x}}{\sum_{i=1}^{n}(x_{i}-\bar{x})^{2}}x_{i}}=\)` **

--

** `\(=\beta_{1}\sum_{i=1}^{n}{\frac{x_{i}-\bar{x}}{\sum_{i=1}^{n}(x_{i}-\bar{x})^{2}}x_{i}}=\)` **

---
class: animated fadeIn

### Parameters of the `\(\hat{\beta}_{1}\)` distribution: `\(E\left[\hat{\beta}_{1}\right]=\beta_{1}\)`

The expectation of ** `\(\hat{\beta}_{1}\)` ** is

- ** `\(E\left[\hat{\beta}_{1}\right] = \sum_{i=1}^{n}{w_{i}(\beta_{0}+\beta_{1}x_{i})} = \beta_{0}\underbrace{\sum_{i=1}^{n}{w_{i}}}_{0}+\beta_{1}\sum_{i=1}^{n}{w_{i}x_{i}}=\beta_{1}\sum_{i=1}^{n}{\frac{x_{i}-\bar{x}}{\sum_{i=1}^{n}(x_{i}-\bar{x})^{2}}x_{i}}=\)` **

** `\(=\beta_{1}\sum_{i=1}^{n}{\frac{x_{i}-\bar{x}}{\sum_{i=1}^{n}(x_{i}-\bar{x})^{2}}x_{i}}=\beta_{1}\frac{\sum_{i=1}^{n}{(x_{i}-\bar{x})x_{i}}}{\sum_{i=1}^{n}(x_{i}-\bar{x})^{2}}=\)` **

---
class: animated fadeIn

### Parameters of the `\(\hat{\beta}_{1}\)` distribution: `\(E\left[\hat{\beta}_{1}\right]=\beta_{1}\)`

The expectation of ** `\(\hat{\beta}_{1}\)` ** is

- ** `\(E\left[\hat{\beta}_{1}\right] = \sum_{i=1}^{n}{w_{i}(\beta_{0}+\beta_{1}x_{i})} = \beta_{0}\underbrace{\sum_{i=1}^{n}{w_{i}}}_{0}+\beta_{1}\sum_{i=1}^{n}{w_{i}x_{i}}=\beta_{1}\sum_{i=1}^{n}{\frac{x_{i}-\bar{x}}{\sum_{i=1}^{n}(x_{i}-\bar{x})^{2}}x_{i}}=\)` **

** `\(=\beta_{1}\sum_{i=1}^{n}{\frac{x_{i}-\bar{x}}{\sum_{i=1}^{n}(x_{i}-\bar{x})^{2}}x_{i}}=\beta_{1}\frac{\sum_{i=1}^{n}{(x_{i}-\bar{x})x_{i}}}{\sum_{i=1}^{n}(x_{i}-\bar{x})^{2}}=\beta_{1}\frac{\sum_{i=1}^{n}{x_{i}^2}-\bar{x}\sum_{i=1}^{n}{x_{i}}}{\sum_{i=1}^{n}(x_{i}-\bar{x})^{2}}\)` **

---
class: animated fadeIn

### Parameters of the `\(\hat{\beta}_{1}\)` distribution: `\(E\left[\hat{\beta}_{1}\right]=\beta_{1}\)`

The expectation of ** `\(\hat{\beta}_{1}\)` ** is

- ** `\(E\left[\hat{\beta}_{1}\right] = \sum_{i=1}^{n}{w_{i}(\beta_{0}+\beta_{1}x_{i})} = \beta_{0}\underbrace{\sum_{i=1}^{n}{w_{i}}}_{0}+\beta_{1}\sum_{i=1}^{n}{w_{i}x_{i}}=\beta_{1}\sum_{i=1}^{n}{\frac{x_{i}-\bar{x}}{\sum_{i=1}^{n}(x_{i}-\bar{x})^{2}}x_{i}}=\)` **

** `\(=\beta_{1}\sum_{i=1}^{n}{\frac{x_{i}-\bar{x}}{\sum_{i=1}^{n}(x_{i}-\bar{x})^{2}}x_{i}}=\beta_{1}\frac{\sum_{i=1}^{n}{(x_{i}-\bar{x})x_{i}}}{\sum_{i=1}^{n}(x_{i}-\bar{x})^{2}}=\beta_{1}\frac{\sum_{i=1}^{n}{x_{i}^2}-\bar{x}\sum_{i=1}^{n}{x_{i}}}{\sum_{i=1}^{n}(x_{i}-\bar{x})^{2}}=\beta_{1}\frac{1}{\sum_{i=1}^{n}(x_{i}-\bar{x})^{2}}\sum_{i=1}^{n}{x_{i}^2}-\frac{\sum_{i=1}^{n}{x_{i}}}{n}\sum_{i=1}^{n}{x_{i}}=\)` **

---
class: animated fadeIn

### Parameters of the `\(\hat{\beta}_{1}\)` distribution: `\(E\left[\hat{\beta}_{1}\right]=\beta_{1}\)`

The expectation of ** `\(\hat{\beta}_{1}\)` ** is

- ** `\(E\left[\hat{\beta}_{1}\right] = \sum_{i=1}^{n}{w_{i}(\beta_{0}+\beta_{1}x_{i})} = \beta_{0}\underbrace{\sum_{i=1}^{n}{w_{i}}}_{0}+\beta_{1}\sum_{i=1}^{n}{w_{i}x_{i}}=\beta_{1}\sum_{i=1}^{n}{\frac{x_{i}-\bar{x}}{\sum_{i=1}^{n}(x_{i}-\bar{x})^{2}}x_{i}}=\)` **

** `\(=\beta_{1}\sum_{i=1}^{n}{\frac{x_{i}-\bar{x}}{\sum_{i=1}^{n}(x_{i}-\bar{x})^{2}}x_{i}}=\beta_{1}\frac{\sum_{i=1}^{n}{(x_{i}-\bar{x})x_{i}}}{\sum_{i=1}^{n}(x_{i}-\bar{x})^{2}}=\beta_{1}\frac{\sum_{i=1}^{n}{x_{i}^2}-\bar{x}\sum_{i=1}^{n}{x_{i}}}{\sum_{i=1}^{n}(x_{i}-\bar{x})^{2}}=\beta_{1}\frac{1}{\sum_{i=1}^{n}(x_{i}-\bar{x})^{2}}\sum_{i=1}^{n}{x_{i}^2}-\frac{\sum_{i=1}^{n}{x_{i}}}{n}\sum_{i=1}^{n}{x_{i}}=\)` **

** `\(=\beta_{1}\frac{1}{\sum_{i=1}^{n}(x_{i}-\bar{x})^{2}}\frac{n\sum_{i=1}^{n}{x_{i}^2}-\left[\sum_{i=1}^{n}{x_{i}}\right]^2}{n}=\)` **

Note: 
  `\(\frac{n\sum_{i=1}^{n}{x_{i}^2}-\left[\sum_{i=1}^{n}{x_{i}}\right]^2}{n} \frac{1}{n} =\frac{n\sum_{i=1}^{n}{x_{i}^2}}{n^{2}}-\frac{\left[\sum_{i=1}^{n}{x_{i}}\right]^2}{n^{2}}=\frac{\sum_{i=1}^{n}{x_{i}^2}}{n}-\bar{x}^{2}=var(x)\)`
  
--

therefore `\(\frac{n\sum_{i=1}^{n}{x_{i}^2}-\left[\sum_{i=1}^{n}{x_{i}}\right]^2}{n} = var(x)n=\sum_{i=1}^{n}(x_{i}-\bar{x})^{2}\)`

--

** `\(=\beta_{1}\frac{1}{\sum_{i=1}^{n}(x_{i}-\bar{x})^{2}}\frac{n\sum_{i=1}^{n}{x_{i}^2}-\left[\sum_{i=1}^{n}{x_{i}}\right]^2}{n}=\beta_{1}\frac{1}{\sum_{i=1}^{n}(x_{i}-\bar{x})^{2}}=\sum_{i=1}^{n}(x_{i}-\bar{x})^{2}=\beta_{1}\)` **

---
class: animated fadeIn 
### Parameters of the `\(\hat{\beta}_{1}\)` distribution: `\(var\left(\hat{\beta}_{1}\right) = \frac{\sigma^{2}}{\sum_{i=1}^{n}(x_{i}-\bar{x})^{2}}\)`

The variance of ** `\(\hat{\beta}_{1}\)` ** is

** `\(var\left(\hat{\beta}_{1}\right) = \sum_{i=1}^{n}{w^{2}_{i}\sigma^{2}}=\)` **

---
class: animated fadeIn 
### Parameters of the `\(\hat{\beta}_{1}\)` distribution: `\(var\left(\hat{\beta}_{1}\right) = \frac{\sigma^{2}}{\sum_{i=1}^{n}(x_{i}-\bar{x})^{2}}\)`

The variance of ** `\(\hat{\beta}_{1}\)` ** is

** `\(var\left(\hat{\beta}_{1}\right) = \sum_{i=1}^{n}{w^{2}_{i}\sigma^{2}} =\sum_{i=1}^{n}{\left[\frac{x_{i}-\bar{x}}{\sum_{i=1}^{n}(x_{i}-\bar{x})^{2}}\right]^{2}}\sigma^{2}\)` **

---
class: animated fadeIn 
### Parameters of the `\(\hat{\beta}_{1}\)` distribution: `\(var\left(\hat{\beta}_{1}\right) = \frac{\sigma^{2}}{\sum_{i=1}^{n}(x_{i}-\bar{x})^{2}}\)`

The variance of ** `\(\hat{\beta}_{1}\)` ** is

** `\(var\left(\hat{\beta}_{1}\right) = \sum_{i=1}^{n}{w^{2}_{i}\sigma^{2}} =\sum_{i=1}^{n}{\left[\frac{x_{i}-\bar{x}}{\sum_{i=1}^{n}(x_{i}-\bar{x})^{2}}\right]^{2}}\sigma^{2}=\sum_{i=1}^{n}{\frac{(x_{i}-\bar{x})^{2}}{\left[\sum_{i=1}^{n}(x_{i}-\bar{x})^{2}\right]^{2}}}\sigma^{2}\)` **

---
class: animated fadeIn 
### Parameters of the `\(\hat{\beta}_{1}\)` distribution: `\(var\left(\hat{\beta}_{1}\right) = \frac{\sigma^{2}}{\sum_{i=1}^{n}(x_{i}-\bar{x})^{2}}\)`

The variance of ** `\(\hat{\beta}_{1}\)` ** is

** `\(var\left(\hat{\beta}_{1}\right) = \sum_{i=1}^{n}{w^{2}_{i}\sigma^{2}} =\sum_{i=1}^{n}{\left[\frac{x_{i}-\bar{x}}{\sum_{i=1}^{n}(x_{i}-\bar{x})^{2}}\right]^{2}}\sigma^{2}=\sum_{i=1}^{n}{\frac{(x_{i}-\bar{x})^{2}}{\left[\sum_{i=1}^{n}(x_{i}-\bar{x})^{2}\right]^{2}}}\sigma^{2}={\frac{\sum_{i=1}^{n}(x_{i}-\bar{x})^{2}}{\left[\sum_{i=1}^{n}(x_{i}-\bar{x})^{2}\right]^{2}}}\sigma^{2}\)` **

---
class: animated fadeIn 
### Parameters of the `\(\hat{\beta}_{1}\)` distribution: `\(var\left(\hat{\beta}_{1}\right) = \frac{\sigma^{2}}{\sum_{i=1}^{n}(x_{i}-\bar{x})^{2}}\)`

The variance of ** `\(\hat{\beta}_{1}\)` ** is

** `\(var\left(\hat{\beta}_{1}\right) = \sum_{i=1}^{n}{w^{2}_{i}\sigma^{2}} =\sum_{i=1}^{n}{\left[\frac{x_{i}-\bar{x}}{\sum_{i=1}^{n}(x_{i}-\bar{x})^{2}}\right]^{2}}\sigma^{2}=\sum_{i=1}^{n}{\frac{(x_{i}-\bar{x})^{2}}{\left[\sum_{i=1}^{n}(x_{i}-\bar{x})^{2}\right]^{2}}}\sigma^{2}={\frac{\sum_{i=1}^{n}(x_{i}-\bar{x})^{2}}{\left[\sum_{i=1}^{n}(x_{i}-\bar{x})^{2}\right]^{2}}}\sigma^{2}={\frac{1}{\sum_{i=1}^{n}(x_{i}-\bar{x})^{2}}}\sigma^{2}\)` **

---
class: animated fadeIn 
### Parameters of the `\(\hat{\beta}_{1}\)` distribution: `\(var\left(\hat{\beta}_{1}\right) = \frac{\sigma^{2}}{\sum_{i=1}^{n}(x_{i}-\bar{x})^{2}}\)`

The variance of ** `\(\hat{\beta}_{1}\)` ** is

** `\(var\left(\hat{\beta}_{1}\right) = \sum_{i=1}^{n}{w^{2}_{i}\sigma^{2}} =\sum_{i=1}^{n}{\left[\frac{x_{i}-\bar{x}}{\sum_{i=1}^{n}(x_{i}-\bar{x})^{2}}\right]^{2}}\sigma^{2}=\sum_{i=1}^{n}{\frac{(x_{i}-\bar{x})^{2}}{\left[\sum_{i=1}^{n}(x_{i}-\bar{x})^{2}\right]^{2}}}\sigma^{2}={\frac{\sum_{i=1}^{n}(x_{i}-\bar{x})^{2}}{\left[\sum_{i=1}^{n}(x_{i}-\bar{x})^{2}\right]^{2}}}\sigma^{2}={\frac{1}{\sum_{i=1}^{n}(x_{i}-\bar{x})^{2}}}\sigma^{2}\)` **

--
And the standard error of  ** `\(\hat{\beta}_{1}\)` ** is

** `\(SE\left(\hat{\beta}_{1}\right) = \sqrt{\frac{\sigma^{2}}{\sum_{i=1}^{n}(x_{i}-\bar{x})^{2}}}\)` **

---
class: animated fadeIn 
### Inference on `\(\hat{\beta}_{1}\)`: hypothesis testing

The null hypothesis is ** `\(\beta_{1}=0\)` ** and the test statistic is

** `$$\frac{\hat{\beta}_{1}-\beta_{1}}{SE(\hat{\beta}_{1})}$$` **

that, under the null hypothesis becomes just ** `\(\frac{\hat{\beta}_{1}}{SE(\hat{\beta}_{1})}\)` **  distributed as a *Student t* with n-2 d.f.

** `$$\text{Reject } H_{0} \text{ if } \left|\frac{\hat{\beta}_{1}}{SE(\hat{\beta}_{1})}\right|>t_{1-\frac{\alpha}{2},n-2}$$` **

---
class: animated fadeIn

### Linear regression: multiple predictors

- ** `\(y\)` ** is the numeric response; ** `\(X_{1},\ldots,X_{p}\)` ** are the predictors, the general formula for the linear model is
** `$$y = f(X)+\epsilon=\beta_{0}+\sum_{j=1}^{p}X_{j}\beta_{j}+ \epsilon$$` **

--
In algebraic form

** `$${\bf y}={\bf X}{\bf \beta}+{\bf \epsilon}$$` **

`$$\begin{bmatrix} 
y_{1}\\
y_{2}\\
y_{3}\\
\vdots\\
y_{n}
\end{bmatrix}=\begin{bmatrix}
1& x_{1,1}&\ldots&x_{1,p}\\
1& x_{2,1}&\ldots&x_{2,p}\\
1& x_{3,1}&\ldots&x_{3,p}\\
\vdots&\vdots&\vdots&\\
1& x_{n,1}&\ldots&x_{n,p}\\
\end{bmatrix}\begin{bmatrix}
\beta_{0}\\
\beta_{1}\\
\vdots\\
\beta_{p}\\
\end{bmatrix}+\begin{bmatrix}
\epsilon_{1}\\
\epsilon_{2}\\
\epsilon_{3}\\
\vdots\\
\epsilon_{n}
\end{bmatrix}$$`

---
class: animated fadeIn
### Linear regression: ordinary least square (OLS)

** OLS target **

** `$$\min_{\hat{\beta}_{0},\hat{\beta}_{1},\ldots,\hat{\beta}_{p}}\left(\sum_{i=1}^{n}y_{i}-\hat{\beta}_{0}-\sum_{j=1}^{p}X_{j}\beta_{j}\right)^{2}=\min_{\hat{\beta}_{0},\hat{\beta}_{1},\ldots,\hat{\beta}_{p}}RSS$$` **

--
** OLS target (algebraic) **
### $$ \min_{\hat{\beta}}  ({\bf y}-{\bf X}{\bf \hat{\beta}})^{\sf T}({\bf y}-{\bf X}{\bf \hat{\beta}})$$

---
class: animated fadeIn
### Linear regression: ordinary least square (OLS) solution
** OLS target  **
** `$$\min_{\hat{\beta}}({\bf y}-{\bf X}{\bf \hat{\beta}})^{\sf T}({\bf y}-{\bf X}{\bf \hat{\beta}})=\min_{\hat{\beta}}\left({\bf y}^{\sf T}{\bf y}-{\bf y}^{\sf T}{\bf X} {\bf \hat{\beta}}-{\bf{\hat{\beta}}}^{\sf T}{\bf X}^{\sf T}{\bf y}+{\bf {\hat{\beta}}}^{\sf T}{\bf X}^{\sf T}{\bf X} {\bf \hat{\beta}}\right)$$` **

--

** First order conditions  **

** `$$\partial_{\hat{\beta}}\left({\bf y}^{\sf T}{\bf y}-{\bf y}^{\sf T}{\bf X} {\bf \hat{\beta}}-{\bf{\hat{\beta}}}^{\sf T}{\bf X}^{\sf T}{\bf y}+{\bf {\hat{\beta}}}^{\sf T}{\bf X}^{\sf T}{\bf X} {\bf \hat{\beta}}\right)=0$$` **

--

Since ** `\(\partial_{\bf\hat{\beta}}\left({\bf y}^{\sf T}{\bf X} {\bf \hat{\beta}}\right)=\partial_{\bf\hat{\beta}}\left({ \bf {\hat{\beta}}}^{\sf T}{\bf X}^{\sf T}{\bf y}\right)={\bf X}^{\sf T}{\bf y}\)` ** it results that

** `$$\partial_{\hat{\beta}}RSS=-2{\bf X}^{\sf T}{\bf y}+2{\bf X}^{\sf T}{\bf X} {\bf \hat{\beta}}=0$$` **

And the ** OLS solution ** is

### $$ \hat{\beta} = ({\bf X}^{\sf T}{\bf X})^{-1}{\bf X}^{\sf T}{\bf y} $$

---
class: animated fadeIn

### Distributional assumptions

** `\(\epsilon \sim N(0,\sigma^{2}{\bf I})\)` **, assuming non-stochastic predictors, it follows that

** `$$E[{\bf y}]= E[{\bf X}\beta+\epsilon]={\bf X}\beta \ \ \text{ and } \ \ var[{\bf y}]= \sigma^{2}{\bf I}$$` **

--
### Distribution of linear combinations

** $$ \text{if } \quad {\bf U}\sim N({\bf \mu},{\bf\Sigma}) \quad \text{ and } \quad {\bf V} = {\bf c}+{\bf D}{\bf U} \quad \text{then} \quad {\bf V}\sim N({\bf c}+{\bf D}\mu,{\bf D}{\bf \Sigma}{\bf D}^{\sf T})$$ **

--
### Distribution of `\(\hat{\beta}\)`

** `\(\hat{\beta} =  ({\bf X}^{\sf T}{\bf X})^{-1}{\bf X}^{\sf T}{\bf y}\)` ** is a linear combination of ** `\(\bf y\)` **, with
** `\({\bf c} = 0\)` ** and ** `\({\bf D} = ({\bf X}^{\sf T}{\bf X})^{-1}{\bf X}^{\sf T}\)` **, then

--

- ** `\(E[\hat{\beta}]={\bf D} E[y] = {\bf D}{\bf X}\beta= ({\bf X}^{\sf T}{\bf X})^{-1}{\bf X}^{\sf T}{\bf X}\beta = \beta\)` **

--

- ** `\(var(\hat{\beta})= {\bf D} \sigma^{2}{\bf D}^{\sf T}=({\bf X}^{\sf T}{\bf X})^{-1}{\bf X}^{\sf T}\sigma^{2}[({\bf X}^{\sf T}{\bf X})^{-1}{\bf X}^{\sf T}]^{\sf T} = \sigma^{2}({\bf X}^{\sf T}{\bf X})^{-1}{\bf X}^{\sf T}{\bf X}({\bf X}^{\sf T}{\bf X})^{-1}=\sigma^{2}({\bf X}^{\sf T}{\bf X})^{-1}\)` **

---
class: animated fadeIn 
### Advertising data: multiple linear regression

At this stage, no pre-processing, no test set evaluation, just fit the regression model on the whole data set

** pre-processing: define a `\(\texttt{recipe}\)` **

```r
adv_recipe = recipe(sales~., data=adv_data)
```

** model specification define a `\(\texttt{parsnip}\)` ** model

```r
adv_model = linear_reg(mode="regression", engine="lm")
```

** define the `\(\texttt{workflow}\)` **

```r
adv_wflow = workflow() %>% 
  add_recipe(recipe=adv_recipe) %>% 
  add_model(adv_model)
```
** fit the model**

```r
adv_fit = adv_wflow %>% 
  fit(data=adv_data)
```

---
class: animated fadeIn 
### Advertising data: back to single models

.pull-left[
<table class="table" style="font-size: 10px; margin-left: auto; margin-right: auto;">
 <thead>
  <tr>
   <th style="text-align:left;"> medium </th>
   <th style="text-align:left;"> term </th>
   <th style="text-align:right;"> estimate </th>
   <th style="text-align:right;"> std.error </th>
   <th style="text-align:right;"> statistic </th>
   <th style="text-align:right;"> p.value </th>
  </tr>
 </thead>
<tbody>
  <tr>
   <td style="text-align:left;background-color: #D9FDEC !important;"> newspaper </td>
   <td style="text-align:left;background-color: #D9FDEC !important;"> (Intercept) </td>
   <td style="text-align:right;background-color: #D9FDEC !important;"> 12.3514 </td>
   <td style="text-align:right;background-color: #D9FDEC !important;"> 0.6214 </td>
   <td style="text-align:right;background-color: #D9FDEC !important;"> 19.8761 </td>
   <td style="text-align:right;background-color: #D9FDEC !important;"> 0.0000 </td>
  </tr>
  <tr>
   <td style="text-align:left;background-color: #D9FDEC !important;"> newspaper </td>
   <td style="text-align:left;background-color: #D9FDEC !important;"> budget </td>
   <td style="text-align:right;background-color: #D9FDEC !important;"> 0.0547 </td>
   <td style="text-align:right;background-color: #D9FDEC !important;"> 0.0166 </td>
   <td style="text-align:right;background-color: #D9FDEC !important;"> 3.2996 </td>
   <td style="text-align:right;background-color: #D9FDEC !important;"> 0.0011 </td>
  </tr>
  <tr>
   <td style="text-align:left;background-color: #CAF6FC !important;"> radio </td>
   <td style="text-align:left;background-color: #CAF6FC !important;"> (Intercept) </td>
   <td style="text-align:right;background-color: #CAF6FC !important;"> 9.3116 </td>
   <td style="text-align:right;background-color: #CAF6FC !important;"> 0.5629 </td>
   <td style="text-align:right;background-color: #CAF6FC !important;"> 16.5422 </td>
   <td style="text-align:right;background-color: #CAF6FC !important;"> 0.0000 </td>
  </tr>
  <tr>
   <td style="text-align:left;background-color: #CAF6FC !important;"> radio </td>
   <td style="text-align:left;background-color: #CAF6FC !important;"> budget </td>
   <td style="text-align:right;background-color: #CAF6FC !important;"> 0.2025 </td>
   <td style="text-align:right;background-color: #CAF6FC !important;"> 0.0204 </td>
   <td style="text-align:right;background-color: #CAF6FC !important;"> 9.9208 </td>
   <td style="text-align:right;background-color: #CAF6FC !important;"> 0.0000 </td>
  </tr>
  <tr>
   <td style="text-align:left;background-color: #FDB5BA !important;"> TV </td>
   <td style="text-align:left;background-color: #FDB5BA !important;"> (Intercept) </td>
   <td style="text-align:right;background-color: #FDB5BA !important;"> 7.0326 </td>
   <td style="text-align:right;background-color: #FDB5BA !important;"> 0.4578 </td>
   <td style="text-align:right;background-color: #FDB5BA !important;"> 15.3603 </td>
   <td style="text-align:right;background-color: #FDB5BA !important;"> 0.0000 </td>
  </tr>
  <tr>
   <td style="text-align:left;background-color: #FDB5BA !important;"> TV </td>
   <td style="text-align:left;background-color: #FDB5BA !important;"> budget </td>
   <td style="text-align:right;background-color: #FDB5BA !important;"> 0.0475 </td>
   <td style="text-align:right;background-color: #FDB5BA !important;"> 0.0027 </td>
   <td style="text-align:right;background-color: #FDB5BA !important;"> 17.6676 </td>
   <td style="text-align:right;background-color: #FDB5BA !important;"> 0.0000 </td>
  </tr>
</tbody>
</table>
]
.pull-right[
<img src="Linear-Regression-part_1_files/figure-html/unnamed-chunk-5-1.png" width="75%" style="display: block; margin: auto;" />
]

---
class: animated fadeIn 
### Advertising data: single vs multiple regression

.pull-left[

```r
avd_lm_models_nest  %>% unnest(model_params) %>% 
  select(medium,term:p.value) %>%
  filter(term!="(Intercept)") %>% arrange(medium) %>% 
  kbl(digits = 4) %>% kable_styling(font_size = 12) %>% 
  column_spec(c(3,6),bold=TRUE) %>% 
  row_spec(1,background = "#D9FDEC") %>% 
  row_spec(2,background = "#CAF6FC") %>% 
  row_spec(3,background = "#FDB5BA") 
```

<table class="table" style="font-size: 12px; margin-left: auto; margin-right: auto;">
 <thead>
  <tr>
   <th style="text-align:left;"> medium </th>
   <th style="text-align:left;"> term </th>
   <th style="text-align:right;"> estimate </th>
   <th style="text-align:right;"> std.error </th>
   <th style="text-align:right;"> statistic </th>
   <th style="text-align:right;"> p.value </th>
  </tr>
 </thead>
<tbody>
  <tr>
   <td style="text-align:left;background-color: #D9FDEC !important;"> newspaper </td>
   <td style="text-align:left;background-color: #D9FDEC !important;"> budget </td>
   <td style="text-align:right;font-weight: bold;background-color: #D9FDEC !important;"> 0.0547 </td>
   <td style="text-align:right;background-color: #D9FDEC !important;"> 0.0166 </td>
   <td style="text-align:right;background-color: #D9FDEC !important;"> 3.2996 </td>
   <td style="text-align:right;font-weight: bold;background-color: #D9FDEC !important;"> 0.0011 </td>
  </tr>
  <tr>
   <td style="text-align:left;background-color: #CAF6FC !important;"> radio </td>
   <td style="text-align:left;background-color: #CAF6FC !important;"> budget </td>
   <td style="text-align:right;font-weight: bold;background-color: #CAF6FC !important;"> 0.2025 </td>
   <td style="text-align:right;background-color: #CAF6FC !important;"> 0.0204 </td>
   <td style="text-align:right;background-color: #CAF6FC !important;"> 9.9208 </td>
   <td style="text-align:right;font-weight: bold;background-color: #CAF6FC !important;"> 0.0000 </td>
  </tr>
  <tr>
   <td style="text-align:left;background-color: #FDB5BA !important;"> TV </td>
   <td style="text-align:left;background-color: #FDB5BA !important;"> budget </td>
   <td style="text-align:right;font-weight: bold;background-color: #FDB5BA !important;"> 0.0475 </td>
   <td style="text-align:right;background-color: #FDB5BA !important;"> 0.0027 </td>
   <td style="text-align:right;background-color: #FDB5BA !important;"> 17.6676 </td>
   <td style="text-align:right;font-weight: bold;background-color: #FDB5BA !important;"> 0.0000 </td>
  </tr>
</tbody>
</table>
]

.pull-right[

```r
adv_fit %>% tidy() %>% 
  filter(term!="(Intercept)") %>% arrange(term) %>% 
  kbl(digits = 4) %>% kable_styling(font_size = 12) %>% 
  column_spec(c(2,5),bold=TRUE) %>% 
  row_spec(1,background = "#D9FDEC") %>% 
  row_spec(2,background = "#CAF6FC") %>% 
  row_spec(3,background = "#FDB5BA") 
```

<table class="table" style="font-size: 12px; margin-left: auto; margin-right: auto;">
 <thead>
  <tr>
   <th style="text-align:left;"> term </th>
   <th style="text-align:right;"> estimate </th>
   <th style="text-align:right;"> std.error </th>
   <th style="text-align:right;"> statistic </th>
   <th style="text-align:right;"> p.value </th>
  </tr>
 </thead>
<tbody>
  <tr>
   <td style="text-align:left;background-color: #D9FDEC !important;"> newspaper </td>
   <td style="text-align:right;font-weight: bold;background-color: #D9FDEC !important;"> -0.0010 </td>
   <td style="text-align:right;background-color: #D9FDEC !important;"> 0.0059 </td>
   <td style="text-align:right;background-color: #D9FDEC !important;"> -0.1767 </td>
   <td style="text-align:right;font-weight: bold;background-color: #D9FDEC !important;"> 0.8599 </td>
  </tr>
  <tr>
   <td style="text-align:left;background-color: #CAF6FC !important;"> radio </td>
   <td style="text-align:right;font-weight: bold;background-color: #CAF6FC !important;"> 0.1885 </td>
   <td style="text-align:right;background-color: #CAF6FC !important;"> 0.0086 </td>
   <td style="text-align:right;background-color: #CAF6FC !important;"> 21.8935 </td>
   <td style="text-align:right;font-weight: bold;background-color: #CAF6FC !important;"> 0.0000 </td>
  </tr>
  <tr>
   <td style="text-align:left;background-color: #FDB5BA !important;"> TV </td>
   <td style="text-align:right;font-weight: bold;background-color: #FDB5BA !important;"> 0.0458 </td>
   <td style="text-align:right;background-color: #FDB5BA !important;"> 0.0014 </td>
   <td style="text-align:right;background-color: #FDB5BA !important;"> 32.8086 </td>
   <td style="text-align:right;font-weight: bold;background-color: #FDB5BA !important;"> 0.0000 </td>
  </tr>
</tbody>
</table>
]

--

&nbsp;
&nbsp;

.center[** why these (apparently) contradictory results? **]

---
class: animated fadeIn 
### Advertising data: single vs multiple regression

```r
library("corrr")
adv_data %>% correlate()  %>%  network_plot()
```

<img src="Linear-Regression-part_1_files/figure-html/unnamed-chunk-6-1.png" width="30%" style="display: block; margin: auto;" />

**Note**:

- ** `\(\texttt{newspaper}\)` ** is correlated with ** `\(\texttt{radio}\)` **, the latter having a significative effect on ** `\(\texttt{radio}\)` ** in the multiple regression model

--

- that is why, in the single model, ** `\(\texttt{newspaper}\)` ** has a significant effect ** `\(\texttt{sales}\)` ** (** `\(\texttt{radio}\)` ** is ignored)

---
class: animated fadeIn center middle inverse

# regression by successive orthogonalisations

---
class: animated fadeIn center middle inverse

### computing multiple regression coefficients
### by means of single regressions

---
class: animated fadeIn
### Algebraic formalization of single regression
  
Consider the data to be centered ** `\(\bar{y}=\bar{x}=0\)` **: this means that ** `\(\beta_{0}=0\)` ** (no intercept model),
** `\({\bf y}=\beta_{1}{\bf x}+\epsilon\)` **

--

The ** `\(\hat{\beta}_{1}\)` ** estimator

** `$$\hat{\beta}_{1}=\frac{\sum_{i}^{n}{x_{i}y_{i}}}{\sum_{i}^{n}{x_{i}^{2}}}=\frac{{\bf x}^{\sf T}{\bf y}}{{\bf x}^{\sf T}{\bf x}}=\frac{\langle {\bf x},{\bf y}\rangle}{\langle {\bf x},{\bf x}\rangle}$$` **

`\({\langle {\bf a},{\bf b}\rangle}\)` is the inner product  `\({\bf a}^{\sf T}{\bf b}\)`.

--

Consider the two predictors model ** `\(y = \beta_{1}X_{1}+\beta_{2}X_{2}+\epsilon\)` **

- **Note**  if the two predictors are such that `\({\bf x}_{1}^{\sf T}{\bf x}_{2} = 0\)` then ** `\(\hat{\beta}_{1}\)` ** and ** `\(\hat{\beta}_{2}\)` ** can be computed by
    - fitting the multiple regression ** `\(y = \beta_{1}X_{1}+\beta_{2}X_{2}+\epsilon\)` **
    - fitting the single regressions ** `\(y = \beta_{1}X_{1}+\epsilon\)` ** and ** `\(y = \beta_{2}X_{2}+\epsilon\)` **

---
class: animated fadeIn
### the orthogonal predictors case

Check the previous claim, given the predictors matrix

`$${\bf X}=\left[{
\begin{bmatrix}
\\
 \\
{\bf x}_{1} \\
 \\
 \\
 \end{bmatrix}
\begin{bmatrix}
\\
 \\
{\bf x}_{2} \\
 \\
 \\
 \end{bmatrix}}
 \right]$$`

and that

** `$${\bf \hat{\beta}}=\left({\bf X}^{\sf T}{\bf X}\right)^{-1}{\bf X}^{\sf T}{\bf y} \rightarrow \left({\bf X}^{\sf T}{\bf X}\right)\hat{\beta} - {\bf X}^{\sf T}{\bf y}={\bf X}^{\sf T}\left({\bf X}\hat{\beta} - {\bf y}\right)=0$$` **

---
class: animated fadeIn
### the non-orthogonal predictors case

In real data, it is impossible to have all the predictors pair-wise orthogonal.

--

&nbsp;

- single and multiple regression coefficients estimates will differ

--

&nbsp;
&nbsp;

- multiple regression coefficients estimates can still be obtained via single regressions via...

--

&nbsp;
&nbsp;

.center[ **successive orthogonalizations** ]

---
class: animated fadeIn
### Regression via  successive orthogonalizations

Consider the four predictors model (with no intercept)

** `$$y=\beta_{1}X_{1}+\beta_{2}X_{2}+\beta_{3}X_{3}+\beta_{4}X_{4}+\epsilon$$` **

--

** Compute the value for `\(\hat{\beta}_{4}\)` using successive orthogonalizations **

&nbsp;
set ** `\({\bf z}_{1}={\bf x}_{1}\)` ** 
&nbsp;
&nbsp;

.my-pull-left[

&nbsp;
&nbsp;

** step 1 **
]

.my-pull-right[
>   - fit ** `\({\bf x}_{2}=\beta_{2|1}{\bf z}_{1}+\epsilon\)` ** 
>
>   - compute ** `\(\hat{\beta}_{2|1}=\frac{\langle {\bf z}_{1},{\bf x}_{2}\rangle}{\langle {\bf z}_{1},{\bf z}_{1}\rangle}\)` **
>
>   - compute ** `\({\bf z}_{2}=x_{2}-\hat{\beta}_{2|1}{\bf z}_{1}\)` **

]

---
class: animated fadeIn
### Regression via  successive orthogonalizations

Consider the four predictors model (with no intercept)

** `$$y=\beta_{1}X_{1}+\beta_{2}X_{2}+\beta_{3}X_{3}+\beta_{4}X_{4}+\epsilon$$` **

** Compute the value for `\(\hat{\beta}_{4}\)` using successive orthogonalizations **

&nbsp;
&nbsp;

.my-pull-left[

&nbsp;
&nbsp;

** step 2 **
]

.my-pull-right[
>   - fit ** `\({\bf x}_{3}=\beta_{3|1}{\bf z}_{1}+\epsilon\)` ** and ** `\({\bf x}_{3}=\beta_{3|2}{\bf z}_{2}+\epsilon\)` ** 
>
>   - compute ** `\(\hat{\beta}_{3|1}=\frac{\langle {\bf z}_{1},{\bf x}_{3}\rangle}{\langle {\bf z}_{1},{\bf z}_{1}\rangle}\)` ** and ** `\(\hat{\beta}_{3|2}=\frac{\langle {\bf z}_{2},{\bf x}_{3}\rangle}{\langle {\bf z}_{2},{\bf z}_{2}\rangle}\)` **
>
>   - compute ** `\({\bf z}_{3}=x_{3}-\hat{\beta}_{3|1}{\bf z}_{1}-\hat{\beta}_{3|2}{\bf z}_{2}\)` ** 
]

---
class: animated fadeIn
### Regression via  successive orthogonalizations

Consider the four predictors model (with no intercept)

** `$$y=\beta_{1}X_{1}+\beta_{2}X_{2}+\beta_{3}X_{3}+\beta_{4}X_{4}+\epsilon$$` **

** Compute the value for `\(\hat{\beta}_{4}\)` using successive orthogonalizations **

&nbsp;
&nbsp;

.my-pull-left[

&nbsp;
&nbsp;

** step 3 **
]

.my-pull-right[
>   - fit ** `\({\bf x}_{4}=\beta_{4|1}{\bf z}_{1}+\epsilon\)` **,** `\({\bf x}_{4}=\beta_{4|2}{\bf z}_{2}+\epsilon\)` ** and ** `\({\bf x}_{4}=\beta_{4|3}{\bf z}_{3}+\epsilon\)` ** 
>
>   - compute ** `\(\hat{\beta}_{4|1}=\frac{\langle {\bf z}_{1},{\bf x}_{4}\rangle}{\langle {\bf z}_{1},{\bf z}_{1}\rangle}\)` **, ** `\(\hat{\beta}_{4|2}=\frac{\langle {\bf z}_{2},{\bf x}_{4}\rangle}{\langle {\bf z}_{2},{\bf z}_{2}\rangle}\)` ** and ** `\(\hat{\beta}_{4|3}=\frac{\langle {\bf z}_{3},{\bf x}_{4}\rangle}{\langle {\bf z}_{3},{\bf z}_{3}\rangle}\)` **
>
>   - ** compute `\({\bf z}_{4}=x_{4}-\hat{\beta}_{4|1}{\bf z}_{1}-\hat{\beta}_{4|2}{\bf z}_{2}-\hat{\beta}_{4|3}{\bf z}_{3}\)` ** 
]

---
class: animated fadeIn
### Regression via  successive orthogonalizations

Consider the four predictors model (with no intercept)

** `$$y=\beta_{1}X_{1}+\beta_{2}X_{2}+\beta_{3}X_{3}+\beta_{4}X_{4}+\epsilon$$` **

** Compute the value for `\(\hat{\beta}_{4}\)` using successive orthogonalizations **

&nbsp;
&nbsp;

.my-pull-left[

&nbsp;

** step 4 **
]

.my-pull-right[
>   - fit ** `\({\bf y}=\beta_{4}{\bf z}_{4}+\epsilon\)` **
>
>   - compute ** `\(\hat{\beta}_{4}=\frac{\langle {\bf z}_{4},{\bf y}\rangle}{\langle {\bf z}_{4},{\bf z}_{4}\rangle}\)` **
]

---
class: animated fadeIn
### Regression via  successive orthogonalizations: a further look

- The residuals vector ** `\({\bf e}=\left({\bf y}-{\bf \hat{y}}\right)\)` ** is orthogonal to the predictor ** `\({\bf x}\)` **, that is
** `\(\left({\bf y}-{\bf \hat{y}}\right)^{\sf T}{\bf x}=0\)` **

`$$\begin{split}
({\bf y}-{\bf \hat{y}})^{\sf T}{\bf x}&={\bf y}^{\sf T}{\bf x}-{\bf \hat{y}}^{\sf T}{\bf x} \\
&={\bf y}^{\sf T}{\bf x} - \underbrace{({\bf  x} ({\bf  x}^{\sf  T}{\bf  x})^{-1}{\bf  x}^{\sf T}{\bf  y})^{\sf T}}_{\bf \hat{y}^{\sf T}}{\bf  x}= \\
&={\bf y}^{\sf T}{\bf x}- {\bf  y}^{\sf  T}{\bf  x} ({\bf  x}^{\sf  T}{\bf  x})^{-1}{\bf  x}^{\sf T}{\bf  x}= {\bf  y}^{\sf  T}{\bf  x}- {\bf  y}^{\sf  T}{\bf  x}=0
\end{split}$$`

--

- recall that ** `\({\bf z}_{2}={\bf x}_{2}-\beta_{2|1}{\bf z}_{1}\)` **, the ** `\({\bf z}_{2}\)` ** is orthogonal to ** `\({\bf z}_{1}\)` **,
and since ** `\({\bf z}_{1}={\bf x}_{1}\)` **, ** `\({\bf z}_{2}\)` ** is orthogonal to ** `\({\bf x}_{1}\)` **, too.

--

-  ** `\({\bf z}_{2}\)` ** is  ** `\({\bf x}_{2}\)` ** _adjusted_  to be orthogonal to  `\({\bf z}_{1}\)` ( `\({\bf x}_{1}\)` ).

--

-  ** `\({\bf z}_{3}\)` ** is  ** `\({\bf x}_{3}\)` ** _adjusted_  to be orthogonal to  `\({\bf z}_{2}\)` and to `\({\bf z}_{1}\)`.

-  ** `\({\bf z}_{4}\)` ** is  ** `\({\bf x}_{4}\)` ** _adjusted_  to be orthogonal to `\({\bf z}_{3}\)`,  `\({\bf z}_{2}\)` and to `\({\bf z}_{1}\)`.

** the multiple regression-based `\(\hat{\beta}_{j}\)`  can be obtained via the single regression `\({\bf y}=\hat{\beta}_{j}{\bf z}_{j}+\epsilon\)` **
    -  ** `\({\bf z}_{j}\)` ** is  ** `\({\bf x}_{j}\)` ** _adjusted_  wrt to the other predictors.

---
class: animated fadeIn
### Regression via  successive orthogonalizations: some considerations

** the multiple regression-based `\(\hat{\beta}_{j}\)`  can be obtained via the single regression `\({\bf y}=\hat{\beta}_{j}{\bf z}_{j}+\epsilon\)` **

- If the predictors are pair-wise orthogonal, the single coefficients of the predictor `\(j\)` on the residual `\(i\)`, `\(\hat{\beta}_{j|i}=0\)` then 
 `\({\bf z}_{j}={\bf x}_{j},  j= 1,\ldots,p\)`. 
 
--
 
 - So ** `\(\hat{\beta}_{j}=\frac{\langle {\bf z}_{j}, {\bf y}\rangle}{\langle {\bf z}_{j}, {\bf z}_{j}\rangle}=\frac{\langle {\bf x}_{j}, {\bf y}\rangle}{\langle {\bf x}_{j}, {\bf x}_{j}\rangle}\)` **, just a single regression of `\({\bf y}\)` on `\({\bf x}_{j}\)`
 
--
 
- if ** `\({\bf x}_{j}\)` ** is correlated with one or more of the other predictors, the squared norm  ** `\(\|{\bf z}_{j}\|^{2}\approx 0\)` **   
    - this makes sense, since the predictor in question won't explain any new information on `\({\bf y}\)`
 
--
 
 - as ** `\(\|{\bf z}_{j}\|^{2} = \langle {\bf z}_{j}, {\bf z}_{j}\rangle\approx 0\)` **, the variability of the OLS estimator explodes, since ** `\(SE(\hat{\beta}_{j})=\frac{\sigma}{\sqrt{\|{\bf z}_{j}\|^{2}}}\)` **