Advertising data
Code
adv_data = read_csv (file= "./data/Advertising.csv" ) |> select (- 1 )
adv_data |> slice_sample (n = 8 ) |>
kbl () |> kable_styling (font_size= 10 )
76.4
0.8
14.8
9.4
66.9
11.7
36.8
9.7
265.6
20.0
0.3
17.4
151.5
41.3
58.5
18.5
23.8
35.1
65.9
9.2
237.4
27.5
11.0
18.9
7.8
38.9
50.6
6.6
197.6
23.3
14.2
16.6
sales is the response, indicating the level of sales in a specific market
TV , Radio and Newspaper are the predictors, indicating the advertising budget spent on the corresponding media
Advertising data
Code
adv_data_tidy = adv_data |> pivot_longer (names_to= "medium" ,values_to= "budget" ,cols = 1 : 3 )
adv_data_tidy |> slice_sample (n = 8 ) |>
kbl () |> kable_styling (font_size= 10 )
14.8
radio
10.1
19.4
radio
33.5
10.5
radio
16.0
11.8
newspaper
23.7
11.7
newspaper
5.9
19.2
TV
193.7
16.9
TV
163.3
16.6
TV
202.5
Advertising data
Code
adv_data_tidy |>
ggplot (aes (x = budget, y = sales)) + theme_minimal () + facet_wrap (~ medium,scales = "free" ) +
geom_point (color= "indianred" ,alpha= .5 ,size= 3 )
Note
: the budget spent on TV is up to 300, less so for Radio and Newspaper
Advertising data: single regressions
We can be naive
and regress sales
on tv
, newspaper
and radio
, separately.
Advertising data: single regressions
Code
avd_lm_models_nest = adv_data_tidy |>
group_by (medium) |>
group_nest (.key = "datasets" ) |>
mutate (
model_output = map (.x= datasets,~ lm (sales~ budget,data= .x)),
model_params = map (.x= model_output, ~ tidy (.x)),
model_metrics = map (.x= model_output, ~ glance (.x)),
model_fitted = map (.x= model_output, ~ augment (.x))
)
# A tibble: 3 × 2
medium datasets
<chr> <list<tibble[,2]>>
1 TV [200 × 2]
2 newspaper [200 × 2]
3 radio [200 × 2]
Advertising data: single regressions
Code
avd_lm_models_nest = adv_data_tidy |>
group_by (medium) |>
group_nest (.key = "datasets" ) |>
mutate (
model_output = map (.x= datasets,~ lm (sales~ budget,data= .x)),
model_params = map (.x= model_output, ~ tidy (.x)),
model_metrics = map (.x= model_output, ~ glance (.x)),
model_fitted = map (.x= model_output, ~ augment (.x))
)
# A tibble: 3 × 3
medium datasets model_output
<chr> <list<tibble[,2]>> <list>
1 TV [200 × 2] <lm>
2 newspaper [200 × 2] <lm>
3 radio [200 × 2] <lm>
Advertising data: single regressions
Code
avd_lm_models_nest = adv_data_tidy |>
group_by (medium) |>
group_nest (.key = "datasets" ) |>
mutate (
model_output = map (.x= datasets,~ lm (sales~ budget,data= .x)),
model_params = map (.x= model_output, ~ tidy (.x)),
model_metrics = map (.x= model_output, ~ glance (.x)),
model_fitted = map (.x= model_output, ~ augment (.x))
)
# A tibble: 3 × 4
medium datasets model_output model_params
<chr> <list<tibble[,2]>> <list> <list>
1 TV [200 × 2] <lm> <tibble [2 × 5]>
2 newspaper [200 × 2] <lm> <tibble [2 × 5]>
3 radio [200 × 2] <lm> <tibble [2 × 5]>
Advertising data: single regressions
Code
avd_lm_models_nest = adv_data_tidy |>
group_by (medium) |>
group_nest (.key = "datasets" ) |>
mutate (
model_output = map (.x= datasets,~ lm (sales~ budget,data= .x)),
model_params = map (.x= model_output, ~ tidy (.x)),
model_metrics = map (.x= model_output, ~ glance (.x)),
model_fitted = map (.x= model_output, ~ augment (.x))
)
# A tibble: 3 × 5
medium datasets model_output model_params model_metrics
<chr> <list<tibble[,2]>> <list> <list> <list>
1 TV [200 × 2] <lm> <tibble [2 × 5]> <tibble [1 × 12]>
2 newspaper [200 × 2] <lm> <tibble [2 × 5]> <tibble [1 × 12]>
3 radio [200 × 2] <lm> <tibble [2 × 5]> <tibble [1 × 12]>
Advertising data: single regressions
Code
avd_lm_models_nest = adv_data_tidy |>
group_by (medium) |>
group_nest (.key = "datasets" ) |>
mutate (
model_output = map (.x= datasets,~ lm (sales~ budget,data= .x)),
model_params = map (.x= model_output, ~ tidy (.x)),
model_metrics = map (.x= model_output, ~ glance (.x)),
model_fitted = map (.x= model_output, ~ augment (.x))
)
# A tibble: 3 × 6
medium datasets model_output model_params model_metrics model_fitted
<chr> <list<tibble[,> <list> <list> <list> <list>
1 TV [200 × 2] <lm> <tibble> <tibble> <tibble>
2 newspaper [200 × 2] <lm> <tibble> <tibble> <tibble>
3 radio [200 × 2] <lm> <tibble> <tibble> <tibble>
This is a nested data structure
with everything stored. The functions tidy
, glance
and augment
pull all the information off of the model output, and arrange it in a tidy way.
Advertising data: nested structure
The quantities nested in the tibble can be pulled out, or they can be expanded
within the tibble itself, using unnest
Code
avd_lm_models_nest |> unnest (model_params)
# A tibble: 6 × 10
medium datasets model_output term estimate std.error statistic p.value
<chr> <list<tibb> <list> <chr> <dbl> <dbl> <dbl> <dbl>
1 TV [200 × 2] <lm> (Int… 7.03 0.458 15.4 1.41e-35
2 TV [200 × 2] <lm> budg… 0.0475 0.00269 17.7 1.47e-42
3 newspaper [200 × 2] <lm> (Int… 12.4 0.621 19.9 4.71e-49
4 newspaper [200 × 2] <lm> budg… 0.0547 0.0166 3.30 1.15e- 3
5 radio [200 × 2] <lm> (Int… 9.31 0.563 16.5 3.56e-39
6 radio [200 × 2] <lm> budg… 0.202 0.0204 9.92 4.35e-19
# ℹ 2 more variables: model_metrics <list>, model_fitted <list>
Advertising data: nested structure
The quantities nested in the tibble can be pulled out, or they can be expanded
within the tibble itself, using unnest
Code
avd_lm_models_nest |> unnest (model_params) |>
select (medium,term: p.value) |>
kbl (digits = 4 ) |> kable_styling (font_size = 10 ) |>
row_spec (1 : 2 ,background = "#D9FDEC" ) |>
row_spec (3 : 4 ,background = "#CAF6FC" ) |>
row_spec (5 : 6 ,background = "#FDB5BA" )
TV
(Intercept)
7.0326
0.4578
15.3603
0.0000
TV
budget
0.0475
0.0027
17.6676
0.0000
newspaper
(Intercept)
12.3514
0.6214
19.8761
0.0000
newspaper
budget
0.0547
0.0166
3.2996
0.0011
radio
(Intercept)
9.3116
0.5629
16.5422
0.0000
radio
budget
0.2025
0.0204
9.9208
0.0000
The effect of budget on sales differs significantly from 0, for all the considered media
Ordinary least squares
\[\min_{\hat{\beta}_{0},\hat{\beta}_{1}}: RSS=\sum_{i=1}^{n}{e_{i}^{2}}=\sum_{i=1}^{n}{(y_{i}-\hat{y}_{i})^{2}}=\sum_{i=1}^{n}{(y_{i}-\hat{\beta}_{0}-\hat{\beta}_{1}x_{i})^{2}}\]
OLS estimator of \(\beta_{0}\)
\[\begin{split}&\partial_{\hat{\beta}_{0}}\sum_{i=1}^{n}{(y_{i}-\hat{\beta}_{0}-\hat{\beta}_{1}x_{i})^{2}}=
-2\sum_{i=1}^{n}{(y_{i}-\hat{\beta}_{0}-\hat{\beta}_{1}x_{i})}=\\
&\sum_{i=1}^{n}{y_{i}}-n\hat{\beta}_{0}-\hat{\beta}_{1}\sum_{i=1}^{n}{x_{i}}=0\rightarrow
\hat{\beta}_{0}=\bar{y}-\hat{\beta}_{1}\bar{x}\end{split}\]
Ordinary least squares
\[\min_{\hat{\beta}_{0},\hat{\beta}_{1}}\sum_{i=1}^{n}{e_{i}^{2}}=\sum_{i=1}^{n}{(y_{i}-\hat{y}_{i})^{2}}=\sum_{i=1}^{n}{(y_{i}-\hat{\beta}_{0}-\hat{\beta}_{1}x_{i})^{2}}\]
OLS estimator of \(\beta_{1}\)
\[\begin{split}&\partial_{\hat{\beta}_{1}}\sum_{i=1}^{n}{(y_{i}-\hat{\beta}_{0}-\hat{\beta}_{1}x_{i})^{2}}=
-2\sum_{i=1}^{n}{x_{i}(y_{i}-\hat{\beta}_{0}-\hat{\beta}_{1}x_{i})}=
\sum_{i=1}^{n}{x_{i}y_{i}}-\hat{\beta}_{0}\sum_{i=1}^{n}{x_{i}}-\hat{\beta}_{1}\sum_{i=1}^{n}{x_{i}^{2}}=0\\
&\hat{\beta}_{1}\sum_{i=1}^{n}{x_{i}^{2}}=\sum_{i=1}^{n}{x_{i}y_{i}}-\sum_{i=1}^{n}{x_{i}}\left(\frac{\sum_{i=1}^{n}{y_{i}}}{n}-\hat{\beta}_{1}\frac{\sum_{i=1}^{n}{x_{i}}}{n}\right)\\
&\hat{\beta}_{1}\left(n\sum_{i=1}^{n}{x_{i}^{2}}-(\sum_{i=1}^{n}{x_{i}})^{2} \right)=
n\sum_{i=1}^{n}{x_{i}y_{i}}-\sum_{i=1}^{n}{x_{i}}\sum_{i=1}^{n}{y_{i}}\\
&\hat{\beta}_{1}=\frac{n\sum_{i=1}^{n}{x_{i}y_{i}}-\sum_{i=1}^{n}{x_{i}}\sum_{i=1}^{n}{y_{i}}}
{n\sum_{i=1}^{n}{x_{i}^{2}}-(\sum_{i=1}^{n}{x_{i}})^{2} }=\frac{\sigma_{xy}}{\sigma^{2}_{x}}
\end{split}\]
Three regression lines
Code
three_regs_plot= avd_lm_models_nest |> unnest (model_fitted) |>
select (medium,sales: .fitted) |>
ggplot (aes (x= budget,y= sales))+ theme_minimal ()+ facet_grid (~ medium,scales= "free" )+
geom_point (alpha= .5 ,color = "indianred" )+
geom_smooth (method= "lm" )+
geom_segment (aes (x= budget,xend= budget,y= .fitted,yend= sales),color= "forestgreen" ,alpha= .25 )
Three regression lines
Code
three_regs_plot = avd_lm_models_nest |> unnest (model_fitted) |>
select (medium,sales: .fitted) |>
ggplot (aes (x= budget,y= sales))+ theme_minimal ()+ facet_grid (~ medium,scales= "free" )+
geom_point (alpha= .5 ,color = "indianred" )+
geom_smooth (method= "lm" )+
geom_segment (aes (x= budget,xend= budget,y= .fitted,yend= sales),color= "forestgreen" ,alpha= .25 )
Three regression lines
Code
three_regs_plot= avd_lm_models_nest |> unnest (model_fitted) |>
select (medium,sales: .fitted) |>
ggplot (aes (x= budget,y= sales))+ theme_minimal ()+ facet_grid (~ medium,scales= "free" )+
geom_point (alpha= .5 ,color = "indianred" )+
geom_smooth (method= "lm" )+
geom_segment (aes (x= budget,xend= budget,y= .fitted,yend= sales),color= "forestgreen" ,alpha= .25 )
Three regression lines
Code
three_regs_plot= avd_lm_models_nest |> unnest (model_fitted) |>
select (medium,sales: .fitted) |>
ggplot (aes (x= budget,y= sales))+ theme_minimal ()+ facet_grid (~ medium,scales= "free" )+
geom_point (alpha= .5 ,color = "indianred" )+
geom_smooth (method= "lm" )+
geom_segment (aes (x= budget,xend= budget,y= .fitted,yend= sales),color= "forestgreen" ,alpha= .25 )
Advertising data: nested structure
The quantities nested in the tibble can be pulled out, or they can be expanded within the tibble itself, using \(\texttt{unnest}\)
Code
avd_lm_models_nest |> unnest (model_metrics) |>
select (medium,r.squared: df.residual) |>
kbl (digits = 4 ) |> kable_styling (font_size = 10 ) |>
row_spec (1 ,background = "#D9FDEC" ) |>
row_spec (2 ,background = "#CAF6FC" ) |>
row_spec (3 ,background = "#FDB5BA" )
TV
0.6119
0.6099
3.2587
312.1450
0.0000
1
-519.0457
1044.091
1053.986
2102.531
198
newspaper
0.0521
0.0473
5.0925
10.8873
0.0011
1
-608.3357
1222.671
1232.566
5134.805
198
radio
0.3320
0.3287
4.2749
98.4216
0.0000
1
-573.3369
1152.674
1162.569
3618.479
198
Linear regression: model assumptions
The linear regression model is
\[y_{i}=\beta_{0}+\beta_{1}x_{1}+\epsilon_{i}\]
where \(\epsilon_{i}\) is a random variable with expected value of 0. For inference, more assumptions must me made:
\(\epsilon_{i}\sim N(0,\sigma^{2})\) , then the variance of the errors \(\sigma^{2}\) does not depend on \(x_{i}\) .
\(Cov(\epsilon_{i},\epsilon_{i'})=0\) , for all \(i\neq i'\) and \(i,i'=1,\ldots,n\) .
\(x_{i}\) non stochastic
It follows that \(y_{i}\) is a random variable such that
Linear regression: model assumptions
Statistics for Business and Economics (Anderson, Sweeney and Williams, (2011))
\(\sigma^{2}\) estimator
The variance \(\sigma^{2}\) is assumed to be constant, but it is unknown, and it has to be estimated:
since \(y_{i}\sim N(\beta_{0}+\beta_{1}x_{i},\sigma^{2})\) , it follows that
\[\frac{y_{i}-\beta_{0}-\beta_{1}x_{i}}{\sigma}\sim N(0,1)\]
furthermore, recall that \(\sum_{i=1}^{n}\left[N(0,1)\right]^{2} = \chi^{2}_{n}\) , then
\[\sum_{i=1}^{n}\left[N(0,1)\right]^{2}=\sum_{i=1}^{n}\left[\frac{y_{i}-\beta_{0}-\beta_{1}x_{i}}{\sigma}\right]^{2}=\chi^{2}_{n}\]
replacing \(\beta_{0}\) and \(\beta_{1}\) with their estimators \(\hat{\beta}_{0}\) and \(\hat{\beta}_{1}\) , the previous becomes
\[\frac{\sum_{i=1}^{n}\left(y_{i}-\hat{\beta}_{0}-\hat{\beta}_{1}x_{i}\right)^{2}}{\sigma^{2}}=\frac{RSS}{\sigma^{2}}=\chi^{2}_{n-2}\] two degree of freedom are lost as the two parameters are replaced by their estimators.
\(\sigma^{2}\) estimator
Finally, since \(E\left[\chi^{2}_{n-2}\right]=n-2\) then
\[E\left[\chi^{2}_{n-2}\right]=E\left[\frac{RSS}{\sigma^{2}}\right]=n-2\]
and because \(\sigma^{2}\) is constant, it can be pulled out from the expectation, and re-write
\[\frac{E\left[RSS\right]}{\sigma^{2}}=n-2\]
it follows that \[\sigma^{2}=E\left[\frac{RSS}{n-2}\right]\] and \(\frac{RSS}{n-2}\) is an unbiased estimator of \(\sigma^{2}\) .
\(\sqrt{\frac{RSS}{n-2}}=RSE\) , the so-called residual standard error
\(\hat{\beta}_{1}\) as a linear combination of \(y_{i}\)
\[\begin{split}
\hat{\beta}_{1}&= \frac{\sum_{i=1}^{n}{\left(y_{i}-\bar{y}\right)\left(x_{i}-\bar{x}\right)} }{\sum_{i=1}^{n}{\left(x_{i}-\bar{x}\right)^{2}}}
=\frac{\sum_{i=1}^{n}{\left[y_{i} \left(x_{i}-\bar{x}\right) -\bar{y}\left(x_{i}-\bar{x}\right)\right]} }{\sum_{i=1}^{n}{\left(x_{i}-\bar{x}\right)^{2}}}=\\
&=\frac{\sum_{i=1}^{n}{y_{i} \left(x_{i}-\bar{x}\right)} -\bar{y}\sum_{i=1}^{n}{\left(x_{i}-\bar{x}\right)}}{\sum_{i=1}^{n}{\left(x_{i}-\bar{x}\right)^{2}}}
= \sum_{i=1}^{n}{y_{i} \frac{\left(x_{i}-\bar{x}\right)}{\sum_{i=1}^{n}{\left(x_{i}-\bar{x}\right)^{2}}}}=
\sum_{i=1}^{n}{w_{i}y_{i}}\end{split}\]
Given a linear combination \(U_{i}=c+d V_{i}\) , if \(V_{i}\sim N(\mu_{v},\sigma^{2}_{v})\) , then \(U_{i}\sim N(c+d\mu_{v},d^{2}\sigma^{2}_{v})\) .
Since \(Y_{i}\sim N(\beta_{0}+\beta_{1}x_{i},\sigma^{2})\) , and \(\hat{\beta}_{1}=\sum_{i=1}^{n}{w_{i}y_{i}}\) then \(c=0\) and \(d=w_{i}\) , then
\[\hat{\beta}_{1}\sim N(\sum_{i=1}^{n}{w_{i}(\beta_{0}+\beta_{1}x_{i})},\sum_{i=1}^{n}{w_{i}^{2}\sigma^{2}})\]
Parameters of the \(\hat{\beta}_{1}\) distribution: \(E\left[\hat{\beta}_{1}\right]=\beta_{1}\)
The expectation of \(\hat{\beta}_{1}\) is
\(E\left[\hat{\beta}_{1}\right] = \sum_{i=1}^{n}{w_{i}(\beta_{0}+\beta_{1}x_{i})} = \beta_{0}\underbrace{\sum_{i=1}^{n}{w_{i}}}_{0}+\beta_{1}\sum_{i=1}^{n}{w_{i}x_{i}}=\beta_{1}\sum_{i=1}^{n}{\frac{x_{i}-\bar{x}}{\sum_{i=1}^{n}(x_{i}-\bar{x})^{2}}x_{i}}=\)
\(=\beta_{1}\sum_{i=1}^{n}{\frac{x_{i}-\bar{x}}{\sum_{i=1}^{n}(x_{i}-\bar{x})^{2}}x_{i}}=\)
Parameters of the \(\hat{\beta}_{1}\) distribution: \(E\left[\hat{\beta}_{1}\right]=\beta_{1}\)
The expectation of \(\hat{\beta}_{1}\) is
\(E\left[\hat{\beta}_{1}\right] = \sum_{i=1}^{n}{w_{i}(\beta_{0}+\beta_{1}x_{i})} = \beta_{0}\underbrace{\sum_{i=1}^{n}{w_{i}}}_{0}+\beta_{1}\sum_{i=1}^{n}{w_{i}x_{i}}=\beta_{1}\sum_{i=1}^{n}{\frac{x_{i}-\bar{x}}{\sum_{i=1}^{n}(x_{i}-\bar{x})^{2}}x_{i}}=\)
\(=\beta_{1}\sum_{i=1}^{n}{\frac{x_{i}-\bar{x}}{\sum_{i=1}^{n}(x_{i}-\bar{x})^{2}}x_{i}}=\beta_{1}\frac{\sum_{i=1}^{n}{(x_{i}-\bar{x})x_{i}}}{\sum_{i=1}^{n}(x_{i}-\bar{x})^{2}}=\)
Parameters of the \(\hat{\beta}_{1}\) distribution: \(E\left[\hat{\beta}_{1}\right]=\beta_{1}\)
The expectation of \(\hat{\beta}_{1}\) is
\(E\left[\hat{\beta}_{1}\right] = \sum_{i=1}^{n}{w_{i}(\beta_{0}+\beta_{1}x_{i})} = \beta_{0}\underbrace{\sum_{i=1}^{n}{w_{i}}}_{0}+\beta_{1}\sum_{i=1}^{n}{w_{i}x_{i}}=\beta_{1}\sum_{i=1}^{n}{\frac{x_{i}-\bar{x}}{\sum_{i=1}^{n}(x_{i}-\bar{x})^{2}}x_{i}}=\)
\(=\beta_{1}\sum_{i=1}^{n}{\frac{x_{i}-\bar{x}}{\sum_{i=1}^{n}(x_{i}-\bar{x})^{2}}x_{i}}=\beta_{1}\frac{\sum_{i=1}^{n}{(x_{i}-\bar{x})x_{i}}}{\sum_{i=1}^{n}(x_{i}-\bar{x})^{2}}=\beta_{1}\frac{\sum_{i=1}^{n}{x_{i}^2}-\bar{x}\sum_{i=1}^{n}{x_{i}}}{\sum_{i=1}^{n}(x_{i}-\bar{x})^{2}}\)
Parameters of the \(\hat{\beta}_{1}\) distribution: \(E\left[\hat{\beta}_{1}\right]=\beta_{1}\)
The expectation of \(\hat{\beta}_{1}\) is
\(E\left[\hat{\beta}_{1}\right] = \sum_{i=1}^{n}{w_{i}(\beta_{0}+\beta_{1}x_{i})} = \beta_{0}\underbrace{\sum_{i=1}^{n}{w_{i}}}_{0}+\beta_{1}\sum_{i=1}^{n}{w_{i}x_{i}}=\beta_{1}\sum_{i=1}^{n}{\frac{x_{i}-\bar{x}}{\sum_{i=1}^{n}(x_{i}-\bar{x})^{2}}x_{i}}=\)
\(=\beta_{1}\sum_{i=1}^{n}{\frac{x_{i}-\bar{x}}{\sum_{i=1}^{n}(x_{i}-\bar{x})^{2}}x_{i}}=\beta_{1}\frac{\sum_{i=1}^{n}{(x_{i}-\bar{x})x_{i}}}{\sum_{i=1}^{n}(x_{i}-\bar{x})^{2}}=\beta_{1}\frac{\sum_{i=1}^{n}{x_{i}^2}-\bar{x}\sum_{i=1}^{n}{x_{i}}}{\sum_{i=1}^{n}(x_{i}-\bar{x})^{2}}=\beta_{1}\frac{1}{\sum_{i=1}^{n}(x_{i}-\bar{x})^{2}}\sum_{i=1}^{n}{x_{i}^2}-\frac{\sum_{i=1}^{n}{x_{i}}}{n}\sum_{i=1}^{n}{x_{i}}=\)
Parameters of the \(\hat{\beta}_{1}\) distribution: \(E\left[\hat{\beta}_{1}\right]=\beta_{1}\)
The expectation of \(\hat{\beta}_{1}\) is
\(E\left[\hat{\beta}_{1}\right] = \sum_{i=1}^{n}{w_{i}(\beta_{0}+\beta_{1}x_{i})} = \beta_{0}\underbrace{\sum_{i=1}^{n}{w_{i}}}_{0}+\beta_{1}\sum_{i=1}^{n}{w_{i}x_{i}}=\beta_{1}\sum_{i=1}^{n}{\frac{x_{i}-\bar{x}}{\sum_{i=1}^{n}(x_{i}-\bar{x})^{2}}x_{i}}=\)
\(=\beta_{1}\sum_{i=1}^{n}{\frac{x_{i}-\bar{x}}{\sum_{i=1}^{n}(x_{i}-\bar{x})^{2}}x_{i}}=\beta_{1}\frac{\sum_{i=1}^{n}{(x_{i}-\bar{x})x_{i}}}{\sum_{i=1}^{n}(x_{i}-\bar{x})^{2}}=\beta_{1}\frac{\sum_{i=1}^{n}{x_{i}^2}-\bar{x}\sum_{i=1}^{n}{x_{i}}}{\sum_{i=1}^{n}(x_{i}-\bar{x})^{2}}=\beta_{1}\frac{1}{\sum_{i=1}^{n}(x_{i}-\bar{x})^{2}}\sum_{i=1}^{n}{x_{i}^2}-\frac{\sum_{i=1}^{n}{x_{i}}}{n}\sum_{i=1}^{n}{x_{i}}=\)
\(=\beta_{1}\frac{1}{\sum_{i=1}^{n}(x_{i}-\bar{x})^{2}}\frac{n\sum_{i=1}^{n}{x_{i}^2}-\left[\sum_{i=1}^{n}{x_{i}}\right]^2}{n}=\)
Note: \(\frac{n\sum_{i=1}^{n}{x_{i}^2}-\left[\sum_{i=1}^{n}{x_{i}}\right]^2}{n} \frac{1}{n} =\frac{n\sum_{i=1}^{n}{x_{i}^2}}{n^{2}}-\frac{\left[\sum_{i=1}^{n}{x_{i}}\right]^2}{n^{2}}=\frac{\sum_{i=1}^{n}{x_{i}^2}}{n}-\bar{x}^{2}=var(x)\)
therefore \(\frac{n\sum_{i=1}^{n}{x_{i}^2}-\left[\sum_{i=1}^{n}{x_{i}}\right]^2}{n} = var(x)n=\sum_{i=1}^{n}(x_{i}-\bar{x})^{2}\)
\(=\beta_{1}\frac{1}{\sum_{i=1}^{n}(x_{i}-\bar{x})^{2}}\frac{n\sum_{i=1}^{n}{x_{i}^2}-\left[\sum_{i=1}^{n}{x_{i}}\right]^2}{n}=\beta_{1}\frac{1}{\sum_{i=1}^{n}(x_{i}-\bar{x})^{2}} \sum_{i=1}^{n}(x_{i}-\bar{x})^{2}=\beta_{1}\)
Parameters of the \(\hat{\beta}_{1}\) distribution: \(var\left(\hat{\beta}_{1}\right) = \frac{\sigma^{2}}{\sum_{i=1}^{n}(x_{i}-\bar{x})^{2}}\)
The variance of \(\hat{\beta}_{1}\) is
\(var\left(\hat{\beta}_{1}\right) = \sum_{i=1}^{n}{w^{2}_{i}\sigma^{2}}=\)
Parameters of the \(\hat{\beta}_{1}\) distribution: \(var\left(\hat{\beta}_{1}\right) = \frac{\sigma^{2}}{\sum_{i=1}^{n}(x_{i}-\bar{x})^{2}}\)
The variance of \(\hat{\beta}_{1}\) is
\(var\left(\hat{\beta}_{1}\right) = \sum_{i=1}^{n}{w^{2}_{i}\sigma^{2}} =\sum_{i=1}^{n}{\left[\frac{x_{i}-\bar{x}}{\sum_{i=1}^{n}(x_{i}-\bar{x})^{2}}\right]^{2}}\sigma^{2}\)
Parameters of the \(\hat{\beta}_{1}\) distribution: \(var\left(\hat{\beta}_{1}\right) = \frac{\sigma^{2}}{\sum_{i=1}^{n}(x_{i}-\bar{x})^{2}}\)
The variance of \(\hat{\beta}_{1}\) is
\(var\left(\hat{\beta}_{1}\right) = \sum_{i=1}^{n}{w^{2}_{i}\sigma^{2}} =\sum_{i=1}^{n}{\left[\frac{x_{i}-\bar{x}}{\sum_{i=1}^{n}(x_{i}-\bar{x})^{2}}\right]^{2}}\sigma^{2}=\sum_{i=1}^{n}{\frac{(x_{i}-\bar{x})^{2}}{\left[\sum_{i=1}^{n}(x_{i}-\bar{x})^{2}\right]^{2}}}\sigma^{2}\)
Parameters of the \(\hat{\beta}_{1}\) distribution: \(var\left(\hat{\beta}_{1}\right) = \frac{\sigma^{2}}{\sum_{i=1}^{n}(x_{i}-\bar{x})^{2}}\)
The variance of \(\hat{\beta}_{1}\) is
\(var\left(\hat{\beta}_{1}\right) = \sum_{i=1}^{n}{w^{2}_{i}\sigma^{2}} =\sum_{i=1}^{n}{\left[\frac{x_{i}-\bar{x}}{\sum_{i=1}^{n}(x_{i}-\bar{x})^{2}}\right]^{2}}\sigma^{2}=\sum_{i=1}^{n}{\frac{(x_{i}-\bar{x})^{2}}{\left[\sum_{i=1}^{n}(x_{i}-\bar{x})^{2}\right]^{2}}}\sigma^{2}={\frac{\sum_{i=1}^{n}(x_{i}-\bar{x})^{2}}{\left[\sum_{i=1}^{n}(x_{i}-\bar{x})^{2}\right]^{2}}}\sigma^{2}\)
Parameters of the \(\hat{\beta}_{1}\) distribution: \(var\left(\hat{\beta}_{1}\right) = \frac{\sigma^{2}}{\sum_{i=1}^{n}(x_{i}-\bar{x})^{2}}\)
The variance of \(\hat{\beta}_{1}\) is
\(var\left(\hat{\beta}_{1}\right) = \sum_{i=1}^{n}{w^{2}_{i}\sigma^{2}} =\sum_{i=1}^{n}{\left[\frac{x_{i}-\bar{x}}{\sum_{i=1}^{n}(x_{i}-\bar{x})^{2}}\right]^{2}}\sigma^{2}=\sum_{i=1}^{n}{\frac{(x_{i}-\bar{x})^{2}}{\left[\sum_{i=1}^{n}(x_{i}-\bar{x})^{2}\right]^{2}}}\sigma^{2}={\frac{\sum_{i=1}^{n}(x_{i}-\bar{x})^{2}}{\left[\sum_{i=1}^{n}(x_{i}-\bar{x})^{2}\right]^{2}}}\sigma^{2}={\frac{1}{\sum_{i=1}^{n}(x_{i}-\bar{x})^{2}}}\sigma^{2}\)
Parameters of the \(\hat{\beta}_{1}\) distribution: \(var\left(\hat{\beta}_{1}\right) = \frac{\sigma^{2}}{\sum_{i=1}^{n}(x_{i}-\bar{x})^{2}}\)
The variance of \(\hat{\beta}_{1}\) is
\(var\left(\hat{\beta}_{1}\right) = \sum_{i=1}^{n}{w^{2}_{i}\sigma^{2}} =\sum_{i=1}^{n}{\left[\frac{x_{i}-\bar{x}}{\sum_{i=1}^{n}(x_{i}-\bar{x})^{2}}\right]^{2}}\sigma^{2}=\sum_{i=1}^{n}{\frac{(x_{i}-\bar{x})^{2}}{\left[\sum_{i=1}^{n}(x_{i}-\bar{x})^{2}\right]^{2}}}\sigma^{2}={\frac{\sum_{i=1}^{n}(x_{i}-\bar{x})^{2}}{\left[\sum_{i=1}^{n}(x_{i}-\bar{x})^{2}\right]^{2}}}\sigma^{2}={\frac{1}{\sum_{i=1}^{n}(x_{i}-\bar{x})^{2}}}\sigma^{2}\)
And the standard error of \(\hat{\beta}_{1}\) is
\(SE\left(\hat{\beta}_{1}\right) = \sqrt{\frac{\sigma^{2}}{\sum_{i=1}^{n}(x_{i}-\bar{x})^{2}}}\)
Inference on \(\hat{\beta}_{1}\) : hypothesis testing
The null hypothesis is \(\beta_{1}=0\) and the test statistic is
\[\frac{\hat{\beta}_{1}-\beta_{1}}{SE(\hat{\beta}_{1})}\]
that, under the null hypothesis becomes just \(\frac{\hat{\beta}_{1}}{SE(\hat{\beta}_{1})}\) distributed as a Student t with n-2 d.f.
\[\text{Reject } H_{0} \text{ if } \left|\frac{\hat{\beta}_{1}}{SE(\hat{\beta}_{1})}\right|>t_{1-\frac{\alpha}{2},n-2}\]
Linear regression: multiple predictors
\(y\) is the numeric response; \(X_{1},\ldots,X_{p}\) are the predictors, the general formula for the linear model is \[y = f(X)+\epsilon=\beta_{0}+\sum_{j=1}^{p}X_{j}\beta_{j}+ \epsilon\]
In algebraic form
\[{\bf y}={\bf X}{\bf \beta}+{\bf \epsilon}\]
\[\begin{bmatrix}
y_{1}\\
y_{2}\\
y_{3}\\
\vdots\\
y_{n}
\end{bmatrix}=\begin{bmatrix}
1& x_{1,1}&\ldots&x_{1,p}\\
1& x_{2,1}&\ldots&x_{2,p}\\
1& x_{3,1}&\ldots&x_{3,p}\\
\vdots&\vdots&\vdots&\\
1& x_{n,1}&\ldots&x_{n,p}\\
\end{bmatrix}\begin{bmatrix}
\beta_{0}\\
\beta_{1}\\
\vdots\\
\beta_{p}\\
\end{bmatrix}+\begin{bmatrix}
\epsilon_{1}\\
\epsilon_{2}\\
\epsilon_{3}\\
\vdots\\
\epsilon_{n}
\end{bmatrix}\]
Linear regression: ordinary least square (OLS)
OLS target
\[\min_{\hat{\beta}_{0},\hat{\beta}_{1},\ldots,\hat{\beta}_{p}}\left(\sum_{i=1}^{n}y_{i}-\hat{\beta}_{0}-\sum_{j=1}^{p}X_{j}\beta_{j}\right)^{2}=\min_{\hat{\beta}_{0},\hat{\beta}_{1},\ldots,\hat{\beta}_{p}}RSS\]
OLS target (algebraic) \[ \min_{\hat{\beta}} ({\bf y}-{\bf X}{\bf \hat{\beta}})^{\sf T}({\bf y}-{\bf X}{\bf \hat{\beta}})\]
Linear regression: ordinary least square (OLS) solution
OLS target
\[\min_{\hat{\beta}}({\bf y}-{\bf X}{\bf \hat{\beta}})^{\sf T}({\bf y}-{\bf X}{\bf \hat{\beta}})=\min_{\hat{\beta}}\left({\bf y}^{\sf T}{\bf y}-{\bf y}^{\sf T}{\bf X} {\bf \hat{\beta}}-{\bf{\hat{\beta}}}^{\sf T}{\bf X}^{\sf T}{\bf y}+{\bf {\hat{\beta}}}^{\sf T}{\bf X}^{\sf T}{\bf X} {\bf \hat{\beta}}\right)\]
First order conditions
\[\partial_{\hat{\beta}}\left({\bf y}^{\sf T}{\bf y}-{\bf y}^{\sf T}{\bf X} {\bf \hat{\beta}}-{\bf{\hat{\beta}}}^{\sf T}{\bf X}^{\sf T}{\bf y}+{\bf {\hat{\beta}}}^{\sf T}{\bf X}^{\sf T}{\bf X} {\bf \hat{\beta}}\right)=0\]
Since \(\partial_{\bf\hat{\beta}}\left({\bf y}^{\sf T}{\bf X} {\bf \hat{\beta}}\right)=\partial_{\bf\hat{\beta}}\left({ \bf {\hat{\beta}}}^{\sf T}{\bf X}^{\sf T}{\bf y}\right)={\bf X}^{\sf T}{\bf y}\) it results that
\[\partial_{\hat{\beta}}RSS=-2{\bf X}^{\sf T}{\bf y}+2{\bf X}^{\sf T}{\bf X} {\bf \hat{\beta}}=0\]
And the OLS solution is
\[ \hat{\beta} = ({\bf X}^{\sf T}{\bf X})^{-1}{\bf X}^{\sf T}{\bf y} \]
Distributional assumptions
\(\epsilon \sim N(0,\sigma^{2}{\bf I})\) , assuming non-stochastic predictors, it follows that
\[E[{\bf y}]= E[{\bf X}\beta+\epsilon]={\bf X}\beta \ \ \text{ and } \ \ var[{\bf y}]= \sigma^{2}{\bf I}\]
Distribution of linear combinations
\[ \text{if } \quad {\bf U}\sim N({\bf \mu},{\bf\Sigma}) \quad \text{ and } \quad {\bf V} = {\bf c}+{\bf D}{\bf U} \quad \text{then} \quad {\bf V}\sim N({\bf c}+{\bf D}\mu,{\bf D}{\bf \Sigma}{\bf D}^{\sf T})\]
Distribution of \(\hat{\beta}\)
\(\hat{\beta} = ({\bf X}^{\sf T}{\bf X})^{-1}{\bf X}^{\sf T}{\bf y}\) is a linear combination of \(\bf y\) , with \({\bf c} = 0\) and \({\bf D} = ({\bf X}^{\sf T}{\bf X})^{-1}{\bf X}^{\sf T}\) , then
\(E[\hat{\beta}]={\bf D} E[y] = {\bf D}{\bf X}\beta= ({\bf X}^{\sf T}{\bf X})^{-1}{\bf X}^{\sf T}{\bf X}\beta = \beta\)
\(var(\hat{\beta})= {\bf D} \sigma^{2}{\bf D}^{\sf T}=({\bf X}^{\sf T}{\bf X})^{-1}{\bf X}^{\sf T}\sigma^{2}[({\bf X}^{\sf T}{\bf X})^{-1}{\bf X}^{\sf T}]^{\sf T} = \sigma^{2}({\bf X}^{\sf T}{\bf X})^{-1}{\bf X}^{\sf T}{\bf X}({\bf X}^{\sf T}{\bf X})^{-1}=\sigma^{2}({\bf X}^{\sf T}{\bf X})^{-1}\)
Advertising data: multiple linear regression
At this stage, no pre-processing, no test set evaluation, just fit the regression model on the whole data set
pre-processing: define a \(\texttt{recipe}\)
Code
adv_recipe = recipe (sales~ ., data= adv_data)
model specification define a \(\texttt{parsnip}\) model
Code
adv_model = linear_reg (mode= "regression" , engine= "lm" )
define the \(\texttt{workflow}\)
Code
adv_wflow = workflow () |>
add_recipe (recipe= adv_recipe) |>
add_model (adv_model)
fit the model
Code
adv_fit = adv_wflow |>
fit (data= adv_data)
Advertising data: back to single models
TV
(Intercept)
7.0326
0.4578
15.3603
0.0000
TV
budget
0.0475
0.0027
17.6676
0.0000
newspaper
(Intercept)
12.3514
0.6214
19.8761
0.0000
newspaper
budget
0.0547
0.0166
3.2996
0.0011
radio
(Intercept)
9.3116
0.5629
16.5422
0.0000
radio
budget
0.2025
0.0204
9.9208
0.0000
Advertising data: single vs multiple regression
Code
avd_lm_models_nest |> unnest (model_params) |>
select (medium,term: p.value) |>
filter (term!= "(Intercept)" ) |> arrange (medium) |>
kbl (digits = 4 ) |> kable_styling (font_size = 12 ) |>
column_spec (c (3 ,6 ),bold= TRUE ) |>
row_spec (1 ,background = "#D9FDEC" ) |>
row_spec (2 ,background = "#CAF6FC" ) |>
row_spec (3 ,background = "#FDB5BA" )
TV
budget
0.0475
0.0027
17.6676
0.0000
newspaper
budget
0.0547
0.0166
3.2996
0.0011
radio
budget
0.2025
0.0204
9.9208
0.0000
Code
adv_fit |> tidy () |>
filter (term!= "(Intercept)" ) |> arrange (term) |>
kbl (digits = 4 ) |> kable_styling (font_size = 12 ) |>
column_spec (c (2 ,5 ),bold= TRUE ) |>
row_spec (1 ,background = "#D9FDEC" ) |>
row_spec (2 ,background = "#CAF6FC" ) |>
row_spec (3 ,background = "#FDB5BA" )
TV
0.0458
0.0014
32.8086
0.0000
newspaper
-0.0010
0.0059
-0.1767
0.8599
radio
0.1885
0.0086
21.8935
0.0000
why this contradictory results
Advertising data: single vs multiple regression
Code
library ("corrr" )
adv_data |> correlate () |> network_plot ()
\(\texttt{newspaper}\) is correlated with \(\texttt{radio}\) , the latter having a significative effect on \(\texttt{radio}\) in the multiple regression model
that is why, in the single model, \(\texttt{newspaper}\) has a significant effect \(\texttt{sales}\) ( \(\texttt{radio}\) is ignored)
regression by successive orthogonalisations
computing multiple regression coefficients by means of single regressions
the orthogonal predictors case
Check the previous claim, given the predictors matrix
\[{\bf X}=\left[{
\begin{bmatrix}
\\
\\
{\bf x}_{1} \\
\\
\\
\end{bmatrix}
\begin{bmatrix}
\\
\\
{\bf x}_{2} \\
\\
\\
\end{bmatrix}}
\right]\]
and that
\[{\bf \hat{\beta}}=\left({\bf X}^{\sf T}{\bf X}\right)^{-1}{\bf X}^{\sf T}{\bf y} \rightarrow \left({\bf X}^{\sf T}{\bf X}\right)\hat{\beta} - {\bf X}^{\sf T}{\bf y}={\bf X}^{\sf T}\left({\bf X}\hat{\beta} - {\bf y}\right)=0\]
the non-orthogonal predictors case
In real data, it is impossible to have all the predictors pair-wise orthogonal.
successive orthogonalizations
Regression via successive orthogonalizations
Consider the four predictors model (with no intercept)
\[y=\beta_{1}X_{1}+\beta_{2}X_{2}+\beta_{3}X_{3}+\beta_{4}X_{4}+\epsilon\]
set \({\bf z}_{1}={\bf x}_{1}\) and fit \({\bf x}_{2}=\beta_{2|1}{\bf z}_{1}+\epsilon\)
compute \(\hat{\beta}_{2|1}=\frac{\langle {\bf z}_{1},{\bf x}_{2}\rangle}{\langle {\bf z}_{1},{\bf z}_{1}\rangle}\)
compute \({\bf z}_{2}=x_{2}-\hat{\beta}_{2|1}{\bf z}_{1}\)
Regression via successive orthogonalizations
Consider the four predictors model (with no intercept)
\[y=\beta_{1}X_{1}+\beta_{2}X_{2}+\beta_{3}X_{3}+\beta_{4}X_{4}+\epsilon\]
Compute the value for \(\hat{\beta}_{4}\) using successive orthogonalizations
fit \({\bf x}_{3}=\beta_{3|1}{\bf z}_{1}+\epsilon\) and \({\bf x}_{3}=\beta_{3|2}{\bf z}_{2}+\epsilon\)
compute \(\hat{\beta}_{3|1}=\frac{\langle {\bf z}_{1},{\bf x}_{3}\rangle}{\langle {\bf z}_{1},{\bf z}_{1}\rangle}\) and \(\hat{\beta}_{3|2}=\frac{\langle {\bf z}_{2},{\bf x}_{3}\rangle}{\langle {\bf z}_{2},{\bf z}_{2}\rangle}\)
compute \({\bf z}_{3}=x_{3}-\hat{\beta}_{3|1}{\bf z}_{1}-\hat{\beta}_{3|2}{\bf z}_{2}\)
Regression via successive orthogonalizations
Consider the four predictors model (with no intercept)
\[y=\beta_{1}X_{1}+\beta_{2}X_{2}+\beta_{3}X_{3}+\beta_{4}X_{4}+\epsilon\]
Compute the value for \(\hat{\beta}_{4}\) using successive orthogonalizations
fit \({\bf x}_{4}=\beta_{4|1}{\bf z}_{1}+\epsilon\) , \({\bf x}_{4}=\beta_{4|2}{\bf z}_{2}+\epsilon\) and \({\bf x}_{4}=\beta_{4|3}{\bf z}_{3}+\epsilon\)
compute \(\hat{\beta}_{4|1}=\frac{\langle {\bf z}_{1},{\bf x}_{4}\rangle}{\langle {\bf z}_{1},{\bf z}_{1}\rangle}\) , \(\hat{\beta}_{4|2}=\frac{\langle {\bf z}_{2},{\bf x}_{4}\rangle}{\langle {\bf z}_{2},{\bf z}_{2}\rangle}\) and \(\hat{\beta}_{4|3}=\frac{\langle {\bf z}_{3},{\bf x}_{4}\rangle}{\langle {\bf z}_{3},{\bf z}_{3}\rangle}\)
compute \({\bf z}_{4}=x_{4}-\hat{\beta}_{4|1}{\bf z}_{1}-\hat{\beta}_{4|2}{\bf z}_{2}-\hat{\beta}_{4|3}{\bf z}_{3}\)
Regression via successive orthogonalizations
Consider the four predictors model (with no intercept)
\[y=\beta_{1}X_{1}+\beta_{2}X_{2}+\beta_{3}X_{3}+\beta_{4}X_{4}+\epsilon\]
Compute the value for \(\hat{\beta}_{4}\) using successive orthogonalizations
fit \({\bf y}=\beta_{4}{\bf z}_{4}+\epsilon\)
compute \(\hat{\beta}_{4}=\frac{\langle {\bf z}_{4},{\bf y}\rangle}{\langle {\bf z}_{4},{\bf z}_{4}\rangle}\)
Regression via successive orthogonalizations: a further look
The residuals vector \({\bf e}=\left({\bf y}-{\bf \hat{y}}\right)\) is orthogonal to the predictor \({\bf x}\) , that is \(\left({\bf y}-{\bf \hat{y}}\right)^{\sf T}{\bf x}=0\)
\[\begin{split}
({\bf y}-{\bf \hat{y}})^{\sf T}{\bf x}&={\bf y}^{\sf T}{\bf x}-{\bf \hat{y}}^{\sf T}{\bf x} \\
&={\bf y}^{\sf T}{\bf x} - \underbrace{({\bf x} ({\bf x}^{\sf T}{\bf x})^{-1}{\bf x}^{\sf T}{\bf y})^{\sf T}}_{\bf \hat{y}^{\sf T}}{\bf x}= \\
&={\bf y}^{\sf T}{\bf x}- {\bf y}^{\sf T}{\bf x} ({\bf x}^{\sf T}{\bf x})^{-1}{\bf x}^{\sf T}{\bf x}= {\bf y}^{\sf T}{\bf x}- {\bf y}^{\sf T}{\bf x}=0
\end{split}\]
recall that \({\bf z}_{2}={\bf x}_{2}-\beta_{2|1}{\bf z}_{1}\) , the \({\bf z}_{2}\) is orthogonal to \({\bf z}_{1}\) , and since \({\bf z}_{1}={\bf x}_{1}\) , \({\bf z}_{2}\) is orthogonal to \({\bf x}_{1}\) , too.
\({\bf z}_{2}\) is \({\bf x}_{2}\) adjusted to be orthogonal to \({\bf z}_{1}\) ( \({\bf x}_{1}\) ).
\({\bf z}_{3}\) is \({\bf x}_{3}\) adjusted to be orthogonal to \({\bf z}_{2}\) and to \({\bf z}_{1}\) .
\({\bf z}_{4}\) is \({\bf x}_{4}\) adjusted to be orthogonal to \({\bf z}_{3}\) , \({\bf z}_{2}\) and to \({\bf z}_{1}\) .
the multiple regression-based \(\hat{\beta}_{j}\) can be obtained via the single regression \({\bf y}=\hat{\beta}_{j}{\bf z}_{j}+\epsilon\) - \({\bf z}_{j}\) is \({\bf x}_{j}\) adjusted wrt to the other predictors.
Regression via successive orthogonalizations: some considerations
the multiple regression-based \(\hat{\beta}_{j}\) can be obtained via the single regression \({\bf y}=\hat{\beta}_{j}{\bf z}_{j}+\epsilon\)
If the predictors are pair-wise orthogonal, the single coefficients of the predictor \(j\) on the residual \(i\) , \(\hat{\beta}_{j|i}=0\) then \({\bf z}_{j}={\bf x}_{j}, j= 1,\ldots,p\) .
So \(\hat{\beta}_{j}=\frac{\langle {\bf z}_{j}, {\bf y}\rangle}{\langle {\bf z}_{j}, {\bf z}_{j}\rangle}=\frac{\langle {\bf x}_{j}, {\bf y}\rangle}{\langle {\bf x}_{j}, {\bf x}_{j}\rangle}\) , just a single regression of \({\bf y}\) on \({\bf x}_{j}\)
if \({\bf x}_{j}\) is correlated with one or more of the other predictors, the squared norm \(\|{\bf z}_{j}\|^{2}\approx 0\)
this makes sense, since the predictor in question won’t explain any new information on \({\bf y}\)
as \(\|{\bf z}_{j}\|^{2} = \langle {\bf z}_{j}, {\bf z}_{j}\rangle\approx 0\) , the variability of the OLS estimator explodes, since \(SE(\hat{\beta}_{j})=\frac{\sigma}{\sqrt{\|{\bf z}_{j}\|^{2}}}\)
Interactions
Code
library (scatterplot3d)
adv_fit_update = adv_wflow |> update_recipe (adv_recipe|> step_rm ("newspaper" ))|>
fit (data= adv_data)
data3d= adv_fit_update |> extract_fit_engine () |> augment (data= adv_data) |>
mutate (und_ov_est = ifelse (.std.resid< 0 ,"indianred" ,"blue" )) |> select (TV, radio, sales, und_ov_est)
plot_3d = scatterplot3d (x= data3d$ TV,y= data3d$ radio,z= data3d$ sales,xlab= "TV" ,ylab= "radio" ,zlab= "sales" ,color= data3d$ und_ov_est,pch= 16 ,box= FALSE ,grid= TRUE )
plot_3d$ plane3d (adv_fit_update |> extract_fit_engine (),draw_polygon= TRUE ,polygon_args = list (col = rgb (.1 ,.6 ,.4 ,.25 )))