Linear models

Code
library(magick)
library(tidyverse)
library(purrr)
library(tidymodels)
library(discrim)
library(kableExtra)
library(janitor)
library(patchwork)

The general formula is \(Y=f(X)+\epsilon\)

  • Estimate \(f(.)\) via \(\hat{f}(.)\)

  • If \(Y\) is numeric, and \(X\) is a single predictor, assuming \(f(.)\) is linear, the model is

\[y_{i} = \beta_{0} + \beta_{1} x_{i} + \epsilon_{i}\]

pros

  • it only takes to estimate the parameters \(\hat{\beta}_{0}\) and \(\hat{\beta}_{1}\) to obtain \(\hat{f}(.)\)

  • straightforward interpretation, also for the multiple predictors case.

cons

  • \(\hat{f}(.)\) might have high bias and lead to poor predictions
The aim is to remove the linearity assumption and improve the predictive power, while still being able to interpret the results

Toy example : linear case \((f(x_{i}) = 1 +.75 x_{i})\)

Code
set.seed(1234)
x = runif(1000,min=0,max = 10) 
eps = rnorm(1000,0,1.5) 
b_0 = 1
b_1 = .75 
y = b_0 + b_1 * x + eps 
pop_xy = tibble(y=y, x=x)
sam_xy = pop_xy |> 
  slice(sample(1:1000,30))

sam_xy |> 
  ggplot(aes(x = x,y = y)) + 
  geom_point(size=3,alpha=.5,
             color="darkblue")+
  xlim(c(0,11))+ylim(c(0,11))+
  theme_minimal()+
  geom_smooth(method="lm",
              color="darkgreen")

Toy example: non linear case \((f(x_{i}) = 1 +3 x_{i}-.5 x^{2}_{i})\)

Code
set.seed(1234)
x = runif(1000,min=0,max = 10) 
eps = rnorm(1000,0,1.5) 
b_0 = 1
b_1 = 3
b_2 = -.5
y = b_0 + b_1 * x + b_2 * (x^2) + eps 
pop_xy = tibble(y=y, x=x)
sam_xy = pop_xy |> 
  slice(sample(1:1000,30))

sam_xy |> 
ggplot(aes(x = x,y = y)) + 
  geom_point(size=3,alpha=.5,
             color="darkblue")+
  theme_minimal() +
  geom_smooth(method="lm",
              color="red")+ 
  geom_smooth(method = "loess",
              color="darkgreen")

Non-linear approaches

Non-linear approaches are often extensions of the linear ones; they consider a single predictor \(X\) .

Polynomial regression

it is the simples way to estimate \(f()\) via a non-linear \(\hat{f}()\). It just takes to replace the predictor \(X\) with its power \(d\) polynomial \(y_{i}=\beta_{0}+\beta_{1}X+\beta_{2}X^{2}+\ldots+\beta_{d}X^{d}+\epsilon_{i}\) .

Step functions

the \(X\) range is split in \(K\) regions, \(R_{j}\), \(j=1,\ldots, K\); within \(R_{j}\), \(y_{i}\) is averaged for each \(i\) such that \(x_{i}\in R_{j}\). The resulting \(\hat{f}()\) is piece-wise constant.

Regression splines

a combination of polynomial regression and step functions: in each of the \(K\) regions \(R_{j}\), a polynomial regression is fit: not just a piece-wise polynomial \(\hat{f}()\), continuity at the knots must be assured.

Smoothing splines

Similar to regression splines, the model flexibility is specified according to a smoothness penalty .

Local Regression

In this case \(\hat{f}()\) is fitted on non-disjoint regions of \(X\).

Polynomial regression

Study the dependence of wage from age, from the Wage dataset in the package ISLR2

Code
library(ISLR2)
data(Wage)

w_vs_a = Wage |> 
  tibble(.rows=1) |>  
  dplyr::select(wage,age) |> 
  mutate(age = as.double(age))

w_vs_a |> 
  ggplot(aes(x=age,y=wage))+
  geom_point(alpha=.5,color="orange")+
  theme_minimal()+
  geom_smooth(method="lm") 

Degree 4 polynomial regression

Pre-processing (recipe) specification

Code
pre_proc_w_vs_a = w_vs_a |> 
  recipe(wage~age) |> 
  step_poly(age,degree=4,
            options=list(raw=T)) 

Pre-processed data

Code
pre_proc_w_vs_a |>
  prep() |> bake(new_data=NULL) |>
  slice(1:6) |> 
  knitr::kable(format="html",digits=2)
wage age_poly_1 age_poly_2 age_poly_3 age_poly_4
75.04 18 324 5832 104976
70.48 24 576 13824 331776
130.98 45 2025 91125 4100625
154.69 43 1849 79507 3418801
75.04 50 2500 125000 6250000
127.12 54 2916 157464 8503056

Degree 4 polynomial regression

model specification

Code
linear_model_spec = linear_reg(
  mode="regression")|>
  set_engine("lm")

workflow specification: combine recipe and model specifications

Code
poly_wflow = workflow() |> 
  add_recipe(pre_proc_w_vs_a) |> 
  add_model(linear_model_spec)

Degree 4 polynomial regression

fit the model: indicate the data to apply the workflow to

Code
poly_fit = poly_wflow |> 
  fit(data=w_vs_a)

model estimates

Code
poly_fit |> tidy() |>
  knitr::kable(digits=2)
term estimate std.error statistic p.value
(Intercept) -184.15 60.04 -3.07 0.00
age_poly_1 21.25 5.89 3.61 0.00
age_poly_2 -0.56 0.21 -2.74 0.01
age_poly_3 0.01 0.00 2.22 0.03
age_poly_4 0.00 0.00 -1.95 0.05

Polynomial fit

To obtain point-wise confidence intervals, it takes \(SE\left[\hat{f}(x_{0})\right]\)

Each observation \(x_{0}\) is associated to the vector \({\bf \ell}^{\sf T}=[1,x_{0},x_{0}^{2},x_{0}^{3},x_{0}^{4}]\) , then

\[SE\left[\hat{f}(x_{0})\right] = \left[{\bf \ell}^{\sf T}_{0} {\bf C}{\bf \ell}_{0} \right]^{1/2}\]

where \({\bf C}\) is the covariance matrix of the coefficient estimators

\[ {\bf C}= \begin{bmatrix} SE({\hat{\beta}_{0}})^{2} & cov(\hat{\beta}_{0},\hat{\beta}_{1}) & \ldots & \ldots & cov(\hat{\beta}_{0},\hat{\beta}_{4}) \\ cov(\hat{\beta}_{1},\hat{\beta}_{0}) &SE({\hat{\beta}_{1}})^{2}& \ldots& \ldots & cov(\hat{\beta}_{1},\hat{\beta}_{4}) \\ \vdots &\vdots & SE({\hat{\beta}_{2}})^{2} & \vdots & \vdots \\ \vdots &\vdots & \vdots & \ldots & \vdots \\ cov(\hat{\beta}_{4},\hat{\beta}_{0}) &\ldots & \ldots & \ldots &SE({\hat{\beta}_{4}})^{2}\\ \end{bmatrix} \]

Why?

\(SE(\hat{f}(x_{0}))= \ell_{0}^{\sf T}\hat{\beta}\ell_{0}\)

\(\hat{f}(x_{0})\) is a linear combination of \(\hat{\beta}\) with \({\bf D} = \ell_{0}^{\sf T}\) , so

\[SE(\hat{f}(x_{0}))=[var(\hat{f}(x_{0}))]^{1/2}=[{\bf D}\hat{\sigma}^{2}({\bf X}^{\sf T}{\bf X})^{-1}{\bf D}^{\sf T}]^{1/2} = [\ell_{0}^{\sf T}{\bf C}\ell_{0}]^{1/2}\]

Therefore the 95% confidence interval is

\[\hat{f}(x_{0})\pm z_{\alpha/2}SE(\hat{f}(x_{0}))=\hat{f}(x_{0})\pm 1.96SE(\hat{f}(x_{0}))\]

Polynomal regression: covariance matrix of \(\hat{\beta}\)

Code
C_mat = poly_fit |>
  extract_fit_engine() |>
  vcov()

C_mat |>
knitr::kable(format="html",
             digits=4)
(Intercept) age_poly_1 age_poly_2 age_poly_3 age_poly_4
(Intercept) 3604.8469 -351.2230 12.1027 -0.1760 9e-04
age_poly_1 -351.2230 34.6538 -1.2072 0.0177 -1e-04
age_poly_2 12.1027 -1.2072 0.0425 -0.0006 0e+00
age_poly_3 -0.1760 0.0177 -0.0006 0.0000 0e+00
age_poly_4 0.0009 -0.0001 0.0000 0.0000 0e+00

New values

Create a grid of values for age

Code
grid_age = tibble(age=
            seq(from=min(w_vs_a$age),
            to=max(w_vs_a$age),
            by=0.01)
            )

Create a matrix with \(\ell_{0}\)’s on rows

Code
l_0s = poly_fit |> 
  extract_recipe() |> 
  bake(new_data=grid_age) |>
  mutate(int=1) |> 
  select(int,
         starts_with("age"))
int age_poly_1 age_poly_2 age_poly_3 age_poly_4
1 18.00 324.00 5832.00 104976.0
1 18.01 324.36 5841.73 105209.5
1 18.02 324.72 5851.46 105443.3
1 18.03 325.08 5861.21 105677.6
1 18.04 325.44 5870.97 105912.2

Two ways to compute \(SE(\hat{f}(x_{0}))\)

Computation using \({\bf \ell}_{0}^{\sf T}{\bf C}{\bf \ell}_{0}\)

Code
SE_fx_0s = l_0s |> pmap(~c(...)) |>
  map(.f=function(x)
    sqrt(t(x) %*% C_mat %*% t(t(x)))
    ) |>
  unlist()

Computation using predict()

Code
x_0s_preds = poly_fit |> extract_fit_engine() |>
  predict(newdata = 
          poly_fit |> 
          extract_recipe() |>
          bake(new_data=grid_age),
          se=T)
lt_C_l pred_se
5.30 5.30
5.29 5.29
5.28 5.28
5.27 5.27
5.26 5.26
5.25 5.25

Confidence band computation and plot

Code
poly_fun = grid_age |> 
  mutate(preds=x_0s_preds$fit,
  low_band =x_0s_preds$fit - 1.96*x_0s_preds$se.fit,
  up_band =x_0s_preds$fit + 1.96*x_0s_preds$se.fit
         )

w_vs_a |> 
  ggplot(aes(x=age,y=wage))+
  geom_point(alpha=.5,color="orange")+
  geom_point(data=poly_fun,aes(x=age,y=preds),size=.05,color="blue")+
  geom_ribbon(data=poly_fun,
              aes(x=age,ymin=low_band,ymax=up_band)
              ,fill="green",color="green",alpha=.25,inherit.aes = F)+
  theme_minimal()

Confidence band (tidy) computation and plot

Code
  poly_fun = poly_fit |> 
    augment(new_data = grid_age) |> 
    mutate(poly_fit |> 
             predict(new_data=grid_age,
                     type="conf_int"))

w_vs_a |>
  ggplot(aes(x=age,y=wage))+
  geom_point(alpha=.5,color="orange")+
  geom_point(data=poly_fun,aes(x=age,y=.pred),size=.05,color="blue")+
  geom_ribbon(data=poly_fun,
              aes(x=age,ymin=.pred_lower,ymax=.pred_upper)
              ,fill="blue",color="blue",alpha=.25,inherit.aes = F)+
  theme_minimal()

Polynomial (logistic) regression

\(P(Y=1|X) = \frac{e^{\beta_{0}+\beta_{1}X+\beta_{2}X^{2}+\beta_{4}X^{4}+\beta_{3}X^{3}}}{1+e^{\beta_{0}+\beta_{1}X+\beta_{2}X^{2}+\beta_{3}X^{3}+\beta_{4}X^{4}}}\)

Same dataset, turn wage into a binary variable: if wage\(_{i}>250\) then high_wage\(_{i}=TRUE\)

Code
w_vs_a |> 
  mutate(high_wage=wage>250) |> 
  ggplot(aes(x=age,y=wage, color=high_wage)) + 
  geom_point() + theme_minimal()

Polynomial (logistic) regression

\(P(Y=1|X) = \frac{e^{\beta_{0}+\beta_{1}X+\beta_{2}X^{2}+\beta_{4}X^{4}+\beta_{3}X^{3}}}{1+e^{\beta_{0}+\beta_{1}X+\beta_{2}X^{2}+\beta_{3}X^{3}+\beta_{4}X^{4}}}\)

Same dataset, turn wage into a binary variable: if wage\(_{i}>250\) then high_wage\(_{i}=TRUE\)

Code
w_vs_a |> 
  mutate(high_wage=wage>250) |> 
  ggplot(aes(x=age,y=as.numeric(high_wage), color=high_wage)) + 
  geom_point() + theme_minimal()

Pre-processing:

Insert the wage transformation into high_wage in the recipe specification.

Note: specify skip=TRUE to ignore this step for new data (that contains age only)

Code
prep_poly_log = w_vs_a |> recipe(wage~age) |>
  step_mutate(high_wage = as.factor(wage>250),
              role="outcome",skip=TRUE) |> 
  step_rm(wage) |>
  step_poly(age,degree=4,
            options=list(raw=T))

Model specification

Code
logistic_model_spec = logistic_reg(
  mode="classification")|>
  set_engine("glm")

Create the workflow and fit the model

Code
poly_log_wflow = workflow() |> 
  add_recipe(prep_poly_log) |> 
  add_model(logistic_model_spec)

poly_log_fit = poly_log_wflow |> 
  fit(data=w_vs_a)
Code
poly_log_fit |> tidy() |> 
  knitr::kable(format="html",digits=3) |> 
  kable_styling(font_size=10)
term estimate std.error statistic p.value
(Intercept) -109.553 47.627 -2.300 0.021
age_poly_1 8.995 4.184 2.150 0.032
age_poly_2 -0.282 0.135 -2.082 0.037
age_poly_3 0.004 0.002 2.023 0.043
age_poly_4 0.000 0.000 -1.968 0.049

Compute the fit with via augment(), and the confidence bands via predict()

Code
fit_data = poly_log_fit |> 
  augment(new_data = grid_age) |> 
  mutate(poly_log_fit |> predict(new_data=grid_age,type="conf_int")) 

fit_data |> 
  select(age,
         ends_with("TRUE")) |> 
  slice(1:10) |> 
  knitr::kable(format="html",digits=5) |> 
  kable_styling(font_size=10)
age .pred_TRUE .pred_lower_TRUE .pred_upper_TRUE
18.00 0 0 0.00166
18.01 0 0 0.00167
18.02 0 0 0.00167
18.03 0 0 0.00168
18.04 0 0 0.00168
18.05 0 0 0.00169
18.06 0 0 0.00169
18.07 0 0 0.00170
18.08 0 0 0.00170
18.09 0 0 0.00171

Polynomial logistic fit

Code
w_vs_a |> 
  mutate(high_wage=wage>250,
         high_wage=as.numeric(high_wage)/4) |> 
  ggplot(aes(x=age,y=high_wage,
             color=wage>250))+
  geom_point()+
  theme_minimal() +  
  scale_y_continuous(limits = c(0,.3))+
  geom_point(data=fit_data,
             aes(x=age,y=.pred_TRUE),
             size=.05,color="blue",
              inherit.aes = F)+
  geom_ribbon(data=fit_data,
              aes(x=age,
                  ymin=.pred_lower_TRUE,
                  ymax=.pred_upper_TRUE),
              fill="green",
              color="green",
              alpha=.25,
              inherit.aes = F)

Step functions: regression

Step functions

Global fit

Polynomial regression, and even regression, fit \(f(.)\) with \(\hat{f}(.)\) using the observed data all at once.

Local fit

Split the \(X\) range into \(K\) intervals, and fit the function within each interval

For \(c_{1},c_{2},\ldots,c_{K}\) breaks, one defines the following \(C_{j}\) , \(j = 0,K\) , indicator (step) functions. For each observation \(i\) , one of the step functions will be \(1\) , and all the others will be \(0\) , depending on the interval \(x_{i}\) belongs to.

\[\begin{eqnarray} C_{0}(X) &=& I( X<c_{1})\\ C_{1}(X) &=& I( c_{1}\leq X<c_{2})\\ C_{2}(X) &=& I( c_{2}\leq X<c_{3})\\ \ldots \ldots \\ C_{K-1}(X) &=& I( c_{K-1}\leq X<c_{K})\\ C_{K}(X) &=& I( c_{K}\leq X)\\ \end{eqnarray}\]

Pre-processing: split the predictor age into three intervals

Code
step_regions_plot = w_vs_a |> 
  ggplot(aes(x=age, y=wage))+
  theme_minimal()+
  annotate(geom="rect",xmin=min(w_vs_a$age),xmax = 35,ymin=0,ymax=max(w_vs_a$wage),fill="indianred",alpha=.25)+
  annotate(geom="rect",xmin=35,xmax=50,ymin=0,ymax=max(w_vs_a$wage),fill="lightgreen",alpha=.25)+
  annotate(geom="rect",xmin=50,xmax=65,ymin=0,ymax=max(w_vs_a$wage),fill="lightblue",alpha=.25)+
  annotate(geom="rect",xmin=65,xmax=80,ymin=0,ymax=max(w_vs_a$wage),fill="indianred",alpha=.25)+
  geom_point(alpha=.5)

Pre-processing: split the predictor age into three intervals

Code
step_pre_proc = w_vs_a |> recipe(wage~age) |> 
  step_mutate(age=cut(age,
        breaks = c(0,35,50,65,80))) 
age wage
(0,35] 75.04315
(0,35] 70.47602
(35,50] 130.98218
(35,50] 154.68529
(35,50] 75.04315
(50,65] 127.11574
(35,50] 169.52854
(0,35] 111.72085
(35,50] 118.88436
(50,65] 128.68049

Define the workflow and fit the model

Code
lm_wflow=workflow() |> 
  add_recipe(step_pre_proc) |> 
  add_model(linear_model_spec)

step_lm_fit = lm_wflow |> fit(data=w_vs_a)

Compute preds and confidence band

Code
step_fit_data = step_lm_fit |> 
  augment(new_data= grid_age) |> 
  mutate(step_lm_fit |> predict(new_data=grid_age,type="conf_int"))

Update the plot with the step function

Code
step_regions_plot + 
  geom_point(data = step_fit_data, aes(x=age, y=.pred), size=.1,color="magenta",inherit.aes = FALSE)+
  geom_ribbon(data = step_fit_data, aes(x=age, ymin=.pred_lower,ymax=.pred_upper),
              fill="magenta",inherit.aes = FALSE,alpha=.25)

Step functions: classification

Step functions: classification

Split the predictor \(\texttt{age}\) into three intervals and use it to predict whether the wage is higher than 250

Code
prep_step_log = w_vs_a |> recipe(wage~age) |>
  step_mutate(high_wage = as.factor(wage>250),
              role="outcome",skip=TRUE) |> 
  step_mutate(age=cut(age,
        breaks = c(0,35,50,65,80))) |> 
  step_rm(wage)
Code
step_regions_plot + geom_point(data=w_vs_a |> mutate(high_wage=(wage>250)),
                                aes(x=age,y=wage,color=high_wage),inherit.aes = FALSE)

Step functions: classification

Workflow specification and fit

Code
class_step_wf = workflow() |> 
  add_recipe(prep_step_log) |> 
  add_model(logistic_model_spec)

class_step_fit = class_step_wf |>
  fit(data = w_vs_a)
Code
class_fit_data = class_step_fit  |> 
  augment(new_data=grid_age) |> 
  mutate(class_step_fit |>
           predict(new_data=grid_age,
                   type = "conf_int")) |> 
  select(age,ends_with("TRUE"))

Step functions: classification

final plot

Code
w_vs_a |> 
  mutate(high_wage=as.factor(wage>250)) |> 
  ggplot(aes(x=age, y=as.double(wage>250)/4,color=high_wage))+ylab("wage > 250")+
  geom_point(alpha=.25,size=5)+theme_minimal()+
  geom_point(data = class_fit_data, aes(x=age, y=.pred_TRUE), size=.1,color="blue",inherit.aes = FALSE)+
  geom_ribbon(data = class_fit_data, aes(x=age, ymin=.pred_lower_TRUE,ymax=.pred_upper_TRUE),
              fill="lightblue",inherit.aes = FALSE,alpha=.25)+
  annotate(geom="rect",xmin=min(w_vs_a$age),xmax = 35,ymin=0,ymax=.25,fill="indianred",alpha=.25)+
  annotate(geom="rect",xmin=35,xmax=50,ymin=0,ymax=.25,fill="lightgreen",alpha=.25)+
  annotate(geom="rect",xmin=50,xmax=65,ymin=0,ymax=.25,fill="lightblue",alpha=.25)+
  annotate(geom="rect",xmin=65,xmax=80,ymin=0,ymax=.25,fill="indianred",alpha=.25)

Basis functions

Basis functions: a general setup

A basis function \(b_{j}(.)\) , \(j=1,\ldots,K\) takes as input the predictor \(X\).

\[y_{i} = \beta_{0}+ \beta_{1}b_{1}(x_{i}) + \beta_{2}b_{2}(x_{i}) + \ldots + \beta_{K}b_{K}(x_{i})+\epsilon_{i}\]

  • If \(b_{j}(x_{i})=x_{i}^{j}\) then the above is a polynomial regression
  • If \(b_{j}(x_{i})=I(c_{j} \leq x_{i} < c_{j+1})\) then the model is piecewise constant (that is \(b_{j}\)’s are step functions )
  • Once the \(b_{j}\)’s are defined, then one uses OLS to estimate the coefficients
  • One of the most common definitions for \(b_{j}\)’s are the regression splines

Regression splines: piecewise linear

  • Just like for step functions, the \(X\) is split into regions, and \(\hat{f}(.)\) is obtained for each region
  • A linear regression is fitted, not just the mean, as in the step function case

Suppose again you are regressing \(\texttt{wage}\) on \(\texttt{age}\) . And that the range of \(\texttt{age}\) is split once, at 50: \(R_{1}:\texttt{age}<50\) and \(R_{2}:\texttt{age}\geq 50\) .

Note that the split value \(\texttt{age}=50\) is referred to as knot.

Regression splines: piecewise linear

Code
w_vs_a |>
  ggplot(aes(x=age,y=wage)) + theme_minimal()+
  geom_point(color="darkgreen")+
  geom_vline(aes(xintercept=50),color="orange")+
  annotate(geom="rect",xmin=min(w_vs_a$age),xmax=50,ymin=0,ymax=max(w_vs_a$wage),fill="indianred",alpha=.25)+
  annotate(geom="rect",xmin=50,xmax=max(w_vs_a$age),ymin=0,ymax=max(w_vs_a$wage),fill="lightblue",alpha=.25)+
  annotate(geom="text",x=22,y=220,label="paste(italic(R),1)",parse=TRUE,size=15,alpha=.3)+
  annotate(geom="text",x=75,y=220,label="paste(italic(R),2)",parse=TRUE,size=15,alpha=.3)->two_regions_plot
two_regions_plot

Two linear regressions, one per region

Turn the data into a list, according to the knot: then fit a linear model on each element of the list

Code
simple_reg_recipe = w_vs_a |> recipe(wage~age)

w_vs_a_list = w_vs_a |> 
  mutate(region=age>50,
         age=as.double(age)) |> 
  group_by(region) |> 
  group_map(~workflow() |>
              add_recipe(simple_reg_recipe) |>
              add_model(linear_model_spec) |>
              fit(data = .x) 
            )

Compute the predictions on a grid of age values, to draw the regression lines and bands

Code
fitted_curves_and_band = w_vs_a_list |> map2(.y=grid_age,.f=function(x,.y){
  line_pred=augment(x,new_data=grid_age);
  band_pred=predict(x,new_data=grid_age,type="conf_int");
  return(bind_cols(line_pred,band_pred))
  })

Two linear regressions per region

Code
two_regions_plot+
  geom_line(data=fitted_curves_and_band[[1]] |>
              filter(age<50),
             aes(x=age,y=.pred),
             color="blue", size=1.5,
             inherit.aes = FALSE)+
  geom_line(data=fitted_curves_and_band[[2]] |>
              filter(age>=50),
             aes(x=age,y=.pred),
             color="red", size=1.5,
             inherit.aes = FALSE)

Regression splines: piecewise linear

pre-processing

Code
pre_proc_w_vs_a = w_vs_a |> 
  recipe(wage~age) |>
  step_mutate(age_to_knot = 
                ifelse(age>50,age-50,0))

workflow and fit

Code
pw_linear = workflow() |>
  add_recipe(pre_proc_w_vs_a) |>
  add_model(linear_model_spec)

pw_linear_fit = pw_linear |>
  fit(data=w_vs_a)

Regression splines: piecewise linear

preds and confidence bands

Code
two_reg_fit_data=pw_linear_fit |> 
  augment(new_data=grid_age) |> 
  mutate(pw_linear_fit |> predict(new_data=grid_age,
                                   type="conf_int"))
Code
two_regions_plot+
  geom_line(data=two_reg_fit_data,
             aes(x=age,y=.pred),
             color="blue",
             inherit.aes = FALSE)+
  geom_ribbon(data=two_reg_fit_data,
             aes(x=age,ymin=.pred_lower,
                 ymax=.pred_upper),
             fill="blue",alpha=.25,
             inherit.aes = FALSE)->two_regions_and_fit

Regression splines: piecewise polynomial

  • A polynomial regression is fitted within each region

cubic splines

Within each of the regions defined by the split, fit a degree 3 polynomial regression

\[\texttt{wage}_{i,x_{i}\in R_{1}}= \beta_{0,R_{1}} + \beta_{1,R_{1}} \texttt{age}_{i}+\beta_{2,R_{1}} \texttt{age}^{2}_{i} +\beta_{3,R_{1}} \texttt{age}^{3}_{i}+\epsilon_{i}\]

\[\texttt{wage}_{i,x_{i}\in R_{2}}= \beta_{0,R_{2}} + \beta_{1,R_{2}} \texttt{age}_{i}+\beta_{2,R_{2}} \texttt{age}^{2}_{i} +\beta_{3,R_{2}} \texttt{age}^{3}_{i}+\epsilon_{i}\]

Regression splines: piecewise polynomial

Fit different polynomial (degree 3) regressions within each region, independent from one another

Code
two_poly_recipe = w_vs_a |> 
  mutate(age=as.double(age),
         region=age>50) |> 
  recipe(wage~age) |> 
  step_poly(age,3)

w_vs_a_list_2 = w_vs_a |> 
  mutate(region=age>50,
         age=as.double(age)) |> 
  group_by(region) |> 
  group_map(~workflow() |>
              # add_recipe(two_poly_recipe) |> 
              add_formula(wage ~ poly(age,3)) |>
              add_model(linear_model_spec) |>
              fit(data = .x) 
            )

Combine predictions and confidence bands for each region

Code
fitted_curves_and_band_2 = w_vs_a_list_2 |> map2(.y=grid_age,.f=function(x,.y){
  line_pred=augment(x,new_data=grid_age);
  band_pred=predict(x,new_data=grid_age,type="conf_int");
  return(bind_cols(line_pred,band_pred))
  })

Two polynomial regressions per region

Code
two_regions_plot+
  geom_point(data=fitted_curves_and_band_2[[1]] |> filter(age<50),
             aes(x=age,y=.pred),
             color="blue",
             inherit.aes = FALSE)+
  geom_point(data=fitted_curves_and_band_2[[2]]|> filter(age>=50),
             aes(x=age,y=.pred),
             color="red",
             inherit.aes = FALSE)

Not ideal!

Cubic splines

Let \(\hat{f}_{1}(.)\) and \(\hat{f}_{2}(.)\) be the two degree 3 polynomials fitted in regions \(R_{1}\) and \(R_{2}\) , and let \(\xi\) the only considered knot

Single fitting curve \(\hat{f}(.)\): conditions

  • continuity at the knot: \(\hat{f}_{1}(\xi) = \hat{f}_{2}(\xi)\)

  • \(\hat{f}(.)\) is supposed to be smooth at the knot, too:

    • \(\hat{f}^{\prime}_{1}(\xi) = \hat{f}^{\prime}_{2}(\xi)\)

    • \(\hat{f}^{\prime\prime}_{1}(\xi) = \hat{f}^{\prime\prime}_{2}(\xi)\)

In general a degree \(d\) spline is a piecewise degree \(d\) polynomial

derivatives up to degree \(d-1\) are required to be continuous at each knot

Spline basis representation

Basis functions are useful to represent regression splines, including piecewise polynomials.

Suppose to have \(K\) knots

cubic spline via basis functions

\[y_{i}= \beta_{0} +\beta_{1} b_{1}(x_{i}) +\beta_{2} b_{2}(x_{i}) +\ldots +\beta_{K} b_{K}(x_{i})+\ldots+\beta_{K+3} b_{K+3}(x_{i}) +\epsilon_{i}\] for a degree 3 (cubic) polynomial and \(K\) knots, \(K+3\) basis functions \(b_{j}(.)\), \(j=1,\ldots,K+3\), are needed.

choice of \(b_{j}(.)\)’s

  • for each term of the polynomial, the choice is obvious: \(b_{j}(x_{i})=x_{i}^{j}\) , \(j=1,\ldots,d\); for cubic splines \(d=3\)

  • at each knot \(\xi\) , \(b_{j}(x_{i})= h(x_{i},\xi)\) a truncated power basis function is used

  • \(h(x_{i},\xi) = (x_{i}-\xi)_{+}^{3} = (x_i-\xi)^{3} \text{ if } x_{i} > \xi, \text{ otherwise } (x_{i}-\xi)_{+}^{3} = 0\)

Truncated power basis: check

Truncated power basis functions: single knot

Suppose to have a single knot at \(\xi\)

\[f(x_{i})= \beta_{0} +\beta_{1} x_{i} +\beta_{2} x_{i}^{2}+\beta_{3} x_{i}^{3} +\beta_{4} (x_{i}-\xi)^{3}_{+}\]

The goal is to fit \(f_{1}(.)\) and \(f_{2}(.)\) , two cubic polymonials in \(R_{1}(x_{i}\leq \xi)\) and \(R_{2}(x_{i}> \xi)\), that is

\[f_{1}(x_{i})= a_{1} +b_{1} x_{i} +c_{1} x_{i}^{2}+d_{1}x_{i}^{3}\] \[f_{2}(x_{i})= a_{2} +b_{2} x_{i} +c_{2} x_{i}^{2}+d_{2}x_{i}^{3}\]

First, re-write the coefficients \(a_{r},b_{r},c_{r},d_{r}, r=1,2\) in terms of \(\beta_{j}\)’s such that

\(f_{1}(x_{i})=f(x_{i})\) and \(f_{2}(x_{i})=f(x_{i})\)

Then, check the following conditions that ensure continuity and smoothness at the knot:

  • \(f_{1}(\xi)=f_{2}(\xi)\)

  • \(f^{'}_{1}(\xi)=f^{'}_{2}(\xi)\)

  • \(f^{''}_{1}(\xi)=f^{''}_{2}(\xi)\)

Check continuity at the knot: \(f_{1}(\xi) = f_{2}(\xi)\)

\(f_{1}(x_{i})=f(x_{i})\)

Define \(a_{1},b_{1},c_{1},d_{1}\) ; note that, in \(R_{1}(x_{i}\leq \xi)\) , \((x_{i}-\xi)^{3}_{+}=0\) , therefore

\[f_{1}(x_{i}) = f(x_{i})\] \[a_{1} + b_{1} x_{i} + c_{1} x_{i}^{2} + d_{1}x_{i}^{3} = \beta_{0} + \beta_{1} x_{i} + \beta_{2} x_{i}^{2}+ \beta_{3} x_{i}^{3}\]

and \(a_{1}=\beta_{0}\) , \(b_{1}=\beta_{1}\) , \(c_{1}=\beta_{2}\) , \(d_{1}=\beta_{3}\) .

Check continuity at the knot

\(f_{2}(x_{i})=f(x_{i})\)

Define \(a_{2},b_{2},c_{2},d_{2}\) ; note that, in \(R_{2}(x_{i}> \xi)\) , then

\[(x_{i}-\xi)^{3}_{+}=(x_{i}-\xi)^{3}\] \[f_{2}(x_{i}) = \beta_{0} + \beta_{1} x_{i} + \beta_{2} x_{i}^{2}+ \beta_{3} x_{i}^{3}+\beta_{4}(x_{i}-\xi)^{3}\] re-writing \[(x_{i}-\xi)^{3}=x_{i}^{3} + 3x_{i}\xi^{2}-3x_{i}^{2}\xi - \xi^{3}\]

and plugging it in

\[a_{2} + b_{2} x_{i} + c_{2} x_{i}^{2} + d_{2}x_{i}^{3} = \beta_{0} + \beta_{1} x_{i} + \beta_{2} x_{i}^{2}+ \beta_{3} x_{i}^{3}+\beta_{4}x_{i}^{3} + 3\beta_{4}x_{i}\xi^{2}-3\beta_{4}x_{i}^{2}\xi - \beta_{4}\xi^{3}\] then \(a_{2}=\beta_{0} - \beta_{4} \xi^{3}\) , \(b_{2}=\beta_{1}+ 3 \beta_{4} \xi^{2}\) , \(c_{2}=\beta_{2} -3 \beta_{4} \xi\) and \(d_{2}=\beta_{3} + \beta_{4}\)

Check continuity at the knot

\[a_{1} +b_{1} \xi +c_{1} \xi^{2}+d_{1} \xi^{3} = a_{2} +b_{2} \xi +c_{2} \xi^{2}+d_{2} \xi^{3}\] recalling the definitions of \(a_{1},\ldots, d_{2}\), the above is re-written as

\[\begin{align} \beta_{0} +\beta_{1} \xi +\beta_{2} \xi^{2}+\beta_{3} \xi^{3} &= \underbrace{\beta_{0} - \beta_{4}\xi^{3}}_{a_{2}} +\underbrace{(\beta_{1}+ 3 \beta_{4} \xi^{2})}_{b_{2}} \xi + \underbrace{(\beta_{2} -3 \beta_{4} \xi)}_{c_{2}} \xi^{2}+\underbrace{(\beta_{3} + \beta_{4})}_{d_{2}} \xi^{3}=\\ &= \beta_{0} - \beta_{4}\xi^{3}+\beta_{1}\xi+ 3 \beta_{4} \xi^{3}+\beta_{2}\xi^{2} -3 \beta_{4} \xi^{3}+ \beta_{3}\xi^{3} + \beta_{4}\xi^{3}=\\ &= \beta_{0} +\beta_{1}\xi+ \beta_{2}\xi^{2} +\beta_{3}\xi^{3} \end{align}\]

Check smoothness: \(f^{'}_{1}(\xi) = f^{'}_{2}(\xi)\)

\(f^{'}_{1}(\xi) = b_{1} + 2 c_{1}\xi_{i}+3 d_{1}\xi^{2}_{i}\) and \(f^{'}_{2}(\xi) = b_{2}+ 2 c_{2}\xi_{i}+3 d_{2}\xi^{2}_{i}\)

recalling the definitions of \(a_{1},\ldots, d_{2}\), the above is re-written as

\[\begin{align} \beta_{1} +2\beta_{2} \xi+3\beta_{3} \xi^{2} &= \underbrace{(\beta_{1}+ 3 \beta_{4} \xi^{2})}_{b_{2}} + 2\underbrace{(\beta_{2} -3 \beta_{4} \xi)}_{c_{2}} \xi+3\underbrace{(\beta_{3} + \beta_{4})}_{d_{2}} \xi^{2}=\\ &= \beta_{1}\xi+ 3 \beta_{4} \xi^{3} + 2\beta_{2}\xi - 6 \beta_{4} \xi^{2}+ 3\beta_{3}\xi^{2}+3\beta_{4}\xi^{2}=\\ &= \beta_{1} +2\beta_{2} \xi+3\beta_{3} \xi^{2} \end{align}\]

Check smoothness: \(f^{''}_{1}(\xi) = f^{''}_{2}(\xi)\)

\(f^{''}_{1}(\xi) = c_{1}\xi+6 d_{1}\xi\) and \(f^{''}_{2}(\xi) = c_{2} + 6 d_{2}\xi\)

recalling the definitions of \(a_{1},\ldots, d_{2}\), the above is re-written as

\[\begin{align} 2\beta_{2} +6\beta_{3} \xi &= + 2\underbrace{(\beta_{2} -3 \beta_{4} \xi)}_{c_{2}}+6\underbrace{(\beta_{3} + \beta_{4})}_{d_{2}} \xi=\\ &= 2\beta_{2} - 6 \beta_{4} \xi+ 6\beta_{3}\xi+6\beta_{4}\xi=\\ &= 2\beta_{2} + 6\beta_{3}\xi \end{align}\]

…end of check

Cubic spline

To fit a cubic spline, one has to specify the basis function, and then fit a linear model

  • the basis functions \(b_{j}(.)'s\) are defined in the \(\texttt{recipe}\) (pre-processing)

  • the model specification and fit do not change

cubic spline with a single knot at \(\texttt{age}=50\)

Code
pre_proc_w_vs_a = w_vs_a |>
  recipe(wage~age) |>
  step_bs(age,degree = 3,options= list( knots = 50))

model specification and fit

Code
cubic_spline_wf = workflow() |>
  add_recipe(pre_proc_w_vs_a) |>
  add_model(linear_model_spec)

cubic_spline_fit = cubic_spline_wf |>
  fit(data=w_vs_a)

Cubic spline

curve and bands data

Code
cubic_spline_preds = cubic_spline_fit |> 
  augment(new_data=grid_age) |> 
  mutate(cubic_spline_fit |> 
           predict(new_data=grid_age,type="conf_int"))
Code
two_regions_plot+
  geom_point(data=cubic_spline_preds,aes(x=age,y=.pred),
             size=.05,color="blue", inherit.aes = F)+
  geom_ribbon(data=cubic_spline_preds,
              aes(x=age,ymin=.pred_lower,ymax=.pred_upper),
              fill="green",color="green",alpha=.25,
              inherit.aes = F)+
  theme_minimal() -> cubic_spline_fit_plot

Cubic splines: training sample sizes

Code
tibble(sizes=c(50,100,250,500,1000,2000,3000)) |> 
  mutate(sizes=as.list(sizes), 
         training_data = sizes |> 
           map(~return(w_vs_a |> 
                         slice(sample(1:n(),.x))))
         ) |> 
  mutate(grid_ages = sizes |> map(~grid_age)
         ) |> 
  mutate(cubic_fit = map2(.x=training_data,.y=grid_ages,
              .f=function(x=.x,y=.y){ 
              pre_proc_w_vs_a = x |> 
               recipe(wage~age) |>
                step_bs(age,degree = 3,
                options= list( knots = 50));   
             cubic_spline_wf = workflow() |>
               add_recipe(pre_proc_w_vs_a) |>
               add_model(linear_model_spec);
             cubic_spline_fit = cubic_spline_wf |>
               fit(data=x)
             return(cubic_spline_fit)}
             )
         ) |> 
   mutate(cubic_preds = map2(.x=cubic_fit,.y=grid_ages,
             .f=function(x=.x,y=.y){                                
             cubic_spline_preds = x |>
               augment(new_data=y) |>
               mutate(x |>
                        predict(new_data=y,
                                type="conf_int"));
             return(cubic_spline_preds)
           }
           )
   ) |> 
  mutate(training_size=sizes) -> sizes_exp

Cubic splines

Pull the training sets from the list, labeled by size

Code
training_points = sizes_exp |> 
  select(starts_with("training")) |>
  unnest(everything()) |> 
  mutate(training_size=as.factor(training_size))

Pull the cubic preds from the list, labeled by size

Code
curves_data = sizes_exp |> 
  select(training_size,cubic_preds) |>
  unnest(everything()) |> 
  mutate(training_size=as.factor(training_size))

Cubic splines: training sample sizes

Code
fit_by_sizes_plot = w_vs_a |> 
  ggplot(aes(x=age,y=wage))+theme_minimal()+geom_point(size=1,alpha=.4,color="lightgrey")+
  geom_point(data=training_points,aes(x=age,y=wage),size=1,alpha=.25,color="green",inherit.aes = FALSE)+
  geom_line(data=curves_data,aes(x=age,y=.pred), color="red",inherit.aes = FALSE)+
  geom_ribbon(data=curves_data,aes(x=age,ymin=.pred_lower,ymax=.pred_upper),fill="cyan",color="blue",alpha=.2,inherit.aes = FALSE)+
  facet_wrap(~training_size,scales = "free")

Cubic splines: varying number of knots

Code
tibble(degrees_freedom=5:10) |> 
  mutate(degrees_freedom=as.list(degrees_freedom), 
         training_data =  map(.x=500,.f=~return(w_vs_a |> 
                         slice(sample(1:n(),.x))))
         ) |> 
  mutate(grid_ages = degrees_freedom |> map(~grid_age)
         ) |> 
  mutate(cubic_fit = pmap(.l=list(x=training_data,
                               y=grid_ages,
                               z=degrees_freedom),
              .f=function(x,y,z){ 
              pre_proc_w_vs_a = x |> 
               recipe(wage~age) |>
                step_bs(age,degree = 3,
                deg_free=z);   
             cubic_spline_wf = workflow() |>
               add_recipe(pre_proc_w_vs_a) |>
               add_model(linear_model_spec);
             cubic_spline_fit = cubic_spline_wf |>
               fit(data=x)
             return(cubic_spline_fit)}
             )
         ) |> 
   mutate(cubic_preds = map2(.x=cubic_fit,.y=grid_ages,
             .f=function(x=.x,y=.y){                                
             cubic_spline_preds = x |>
               augment(new_data=y) |>
               mutate(x |>
                        predict(new_data=y,
                                type="conf_int"));
             return(cubic_spline_preds)
           }
           )
   )-> knots_exp

Cubic splines

Compute the number of knots out of the degrees of freedom: \(knots= df-d-1\), where \(d\) is the degree of the polynomial

Code
knots_exp = knots_exp |>  
  unnest(degrees_freedom) |> 
  mutate(n_knots=degrees_freedom-4)

Pull the training sets from the list, labeled by the number of knots

Code
training_points = knots_exp |> 
  select(training_data,n_knots) |>
  unnest(everything()) |> 
  mutate(n_knots=as.factor(n_knots))

Pull the cubic preds from the list, labeled by the number of knots

Code
curves_data = knots_exp |> 
  select(n_knots,cubic_preds) |>
  unnest(everything()) |> 
  mutate(n_knots=as.factor(n_knots))

Cubic splines: 1 to 6 knots

Code
fit_by_knots_plot = w_vs_a |> 
  ggplot(aes(x=age,y=wage))+theme_minimal()+geom_point(size=1,alpha=.4,color="lightgrey")+
  geom_point(data=training_points,aes(x=age,y=wage),size=1,alpha=.25,color="green",inherit.aes = FALSE)+
  geom_line(data=curves_data,aes(x=age,y=.pred), color="red",inherit.aes = FALSE)+
  geom_ribbon(data=curves_data,aes(x=age,ymin=.pred_lower,ymax=.pred_upper),fill="cyan",color="blue",alpha=.2,inherit.aes = FALSE)+
  facet_wrap(~n_knots,scales = "free")

Cubic splines and natural cubic splines

  • The higher the number of knots, the higher the flexibility of \(\hat{f}(.)\) (and, its variability)

  • For larger the training set sizes, a higher number of knots can be selected (via e.g. cross-validation).

  • In the outer range of \(X\) there are usually fewer observations, and splines can have high variance

  • To tackle this drawback, one can impose the function to be linear in the boundary regions

  • A cubic spline with the additional linearity constraints is defined a natural cubic spline

Cubic splines and natural cubic splines

Code
w_vs_a_sample = w_vs_a |> slice_sample(n=250)

cubic spline

natural spline

pre-processing

Code
pre_proc_w_vs_a_bs = w_vs_a_sample |>
  recipe(wage~age) |>
  step_bs(age,deg_free = 7)
Code
pre_proc_w_vs_a_ns = w_vs_a_sample |>
  recipe(wage~age) |>
  step_ns(age,deg_free = 7)

. . .

model specification and fit

Code
cubic_spline_wf = workflow() |>
  add_recipe(pre_proc_w_vs_a_bs) |>
  add_model(linear_model_spec)

cubic_spline_fit = cubic_spline_wf |>
  fit(data=w_vs_a_sample)
Code
natural_spline_wf = workflow() |>
  add_recipe(pre_proc_w_vs_a_ns) |>
  add_model(linear_model_spec)

natural_spline_fit = natural_spline_wf |>
  fit(data=w_vs_a_sample)

curve and bands data

Code
cubic_spline_preds = cubic_spline_fit |> 
  augment(new_data=grid_age) |> 
  mutate(cubic_spline_fit |> 
           predict(new_data=grid_age,type="conf_int"))
Code
natural_spline_preds = natural_spline_fit |> 
  augment(new_data=grid_age) |> 
  mutate(natural_spline_fit |> 
           predict(new_data=grid_age,type="conf_int"))

cubic vs natural fit

Code
cub_vs_nat = rbind(cubic_spline_preds |> mutate(method="cubic"),
                   natural_spline_preds |> mutate(method="natural"))

w_vs_a |> 
  ggplot(aes(x=age,y=wage))+theme_minimal()+geom_point(size=1,alpha=.2,color="lightgrey")+
  geom_point(data=w_vs_a_sample,aes(x=age,y=wage),size=1,color="lightgreen",inherit.aes = FALSE)+
  scale_linetype_manual(values = c("dotted","solid"))+
  geom_line(data=cub_vs_nat,aes(x=age,y=.pred, colour = method,linetype=method),inherit.aes = FALSE)+
  geom_ribbon(data=cub_vs_nat,aes(x=age,ymin=.pred_lower,ymax=.pred_upper,color=method,linetype=method),alpha=.05,inherit.aes = FALSE)

Smoothing splines

Smoothing vs regression splines

In regression splines > - pick \(K\) knots > - define a basis function within each of the resulting regions > - estimate the spline coefficients via least squares

In smoothing splines > - find \(g(\cdot)\) that minimizes the loss function \[\sum_{i=1}^{n}\left(y_{i}-g(x_{i})\right)^{2}\] > - with no restrictions, one could pick \(g(\cdot)\) so that \(\sum_{i=1}^{n}\left(y_{i}-g(x_{i})\right)^{2}=0\) > - the fitted curve would interpolate the training observations (massive overfitting) . . .

  • to avoid this, a roughness penalty is included

Smoothing splines

the minimization problem is \(loss+penalty\), that is \[\sum_{i=1}^{n}\left(y_{i}-g(x_{i})\right)^{2}+ \lambda\int g''(t) dt\] - where \(g''(t)\) is the second derivative of \(g(\cdot)\) at \(t\), indicating if the slope of the function is changing (increasing/descreasing) or not

 

  • the rougher \(g(\cdot)\), the larger the value of \(\int g''(t) dt\);

  • \(\lambda\) is the penalty parameter that dictates how smooth \(g(\cdot)\) will be

(effective) degrees of freedom

A smoothing spline has \(n\) nominal degrees of freedom since it has \(n\) parameters (equivalent to one knot per training observation).

  • Because of the roughness penalty, however, the \(n\) parameters are heavily shrunk.

  • To measure the model complexity/flexibility \(df_{y}\) is used instead: the effective degrees of freedom

  • Consider \({\bf \hat g}_{\lambda}\) to be the vector of estimated values (one per \(x_{i}\), \(i = 1,\ldots, n\)) for a specific \(\lambda\) value, it is obtained \[{\bf \hat g}_{\lambda} = {\bf S}_{\lambda}{\bf y}\] where \({\bf S}_{\lambda}\) is the smoothing matrix that is, the smoothing spline analogue of the hat matrix \({\bf H} = {\bf X}\left({\bf X}^{\sf T}{\bf X}\right){\bf X}^{\sf T}\) for the regression case.

  • then effective degrees of freedom are given by \(df_{y}=trace\left({\bf S}_{\lambda}\right)\)

Direct computation of the Leave-one-out cross validation estimate of the test error

In fact the LOOCV estimate of the test RSS is computed by

\[RSS_{cv}(\lambda)=\sum_{i =1}^{n}{\left(y_{i}-\hat{g}^{(-i)}_{\lambda}(x_{i})\right)^{2}}=\sum_{i =1}^{n}{\left[\frac{y_{i}-\hat{g}_{\lambda}(x_{i})}{1-\left\{{\bf S}_{\lambda}\right\}_{ii}} \right]}\]

where \(\hat{g}^{(-i)}_{\lambda}(\cdot)\) is the function fitted on all but the \(i^{th}\) training observations, and \(\left\{{\bf S}_{\lambda}\right\}_{ii}\) is the \(i^{th}\) element of the matrix \({\bf S}_{\lambda}\)

Code
wage_smooth_df_32 <- smooth.spline(x = w_vs_a$age, y = w_vs_a$wage, df = 32)
wage_smooth_df_16 <- smooth.spline(x = w_vs_a$age, y = w_vs_a$wage, df = 16)
wage_smooth_cv <- smooth.spline(x = w_vs_a$age, y = w_vs_a$wage, cv = TRUE)
tibble(wage_smooth_df_32$df,wage_smooth_df_16$df,wage_smooth_cv$df)|> kbl()
wage_smooth_df_32$df wage_smooth_df_16$df wage_smooth_cv$df
31.99591 16.00237 6.794596

Collect the predictions from the three fitted models

Code
age_grid=seq(from=min(w_vs_a$age),to=max(w_vs_a$age),by=.1)

all_preds = bind_rows(
  as_tibble(predict(wage_smooth_df_32, age_grid)) |>
    mutate(model = "32 degrees of freedom"),
    as_tibble(predict(wage_smooth_df_16, age_grid)) |>
    mutate(model = "16 degrees of freedom"),
  as_tibble(predict(wage_smooth_cv, age_grid)) |>
    mutate(model = "LOOCV")
)

Plot the three smoothing splines

Code
w_vs_a |> 
  ggplot(aes(x=age,y=wage))+theme_minimal()+geom_point(size=1,alpha=.2,color="lightgrey")+
  geom_line(data = all_preds, aes(x=x,y=y,color=model))

Local regression

Local regression

The local regression fit \(\hat{f}(x_{0})\) depends on the span, that is the proportion of points to be considered as neighbors of \(x_{0}\).

In particular, a weighted linear regression is fitted by minimizing \[\sum_{i=1}^{n}{K_{i0}}(y_{i}-\beta_{0}-\beta_{1}x_{i})^{2}\]

where \(K_{i0}\) is a weight proportional to the closeness of the \(x_{i}\)’s to \(x_{0}\): for the neighbors \(K_{i0}>0\), for the non-neighbors \(K_{i0}=0\)

the weighted least squares minimizers \(\hat{\beta_{0}}\) and \(\hat{\beta_{1}}\) are then used to get the estimate of the function at \(x_{0}\)

\[\hat{f}(x_{0})=\hat{\beta_{0}}+ \hat{\beta_{1}}x_{0}\]

Local regression: tuning parameter(s)

  • the weighting function to set the values for \(K_{i0}\)
  • the fit: a linear function is an option: it could be linear, or quadratic

However, the parameter affecting \(\hat{f}\) the most is the span: a low value will correspond to a lower number of neighbours, so \(\hat{f}\) will be more flexible

Code
loess_data= tibble(span=c(.1,.3,.5,.7,.9)) |>
  mutate(model_fit=map(.x=span,~loess(wage ~ age, data = w_vs_a, span = .x)),
         preds=map(.x=model_fit,~predict(.x,age_grid)),
         span=ordered(span),
         age = map(1:5,\(x) age_grid)
  ) |> select(-model_fit) |> unnest(preds,age)

Local regression: tuning parameter(s)

  • the weighting function to set the values for \(K_{i0}\)
  • the fit: a linear function is an option: it could be linear, or quadratic

However, the parameter affecting \(\hat{f}\) the most is the span: a low value will correspond to a lower number of neighbours, so \(\hat{f}\) will be more flexible

Code
loess_plot = w_vs_a |> 
  ggplot(aes(x=age,y=wage))+theme_minimal()+
  geom_point(size=1,alpha=.2,color="lightgrey")+
  geom_line(data = loess_data, aes(x=age,y=preds,color=span))

Generalized additive models: putting it all together

GAM

Generalized additive models (GAMs) extend a standard linear model to non-linear functions of each of the predictors

\[y_{i} = \beta_{0}+\beta_{1}f(x_{i1})+\beta_{2}f(x_{i2})+\ldots+\beta_{p}f(x_{ip})+\epsilon_{i}\]

Code
library(gam)
library(splines)
data(Wage,package = "ISLR2")

wage_gam_ns_fit <- gam(wage ~ ns(year, 4) + ns(age, 5) + education, data = Wage)
d <- preplot(wage_gam_ns_fit)

plot_structure=tibble(plot_name=names(d) , data = d |> map(~tibble(x=.$x,y=.$y,se_y=.$se.y) |> 
                                                              mutate(y_lower=y-se_y,
                                                                     y_upper=y+se_y)
)
) |> mutate(plots=map2(.x = data,.y=plot_name,
                        function(x=.x,y=.y){
                          if(!is.factor(x$x)){
                            myplot=x |> ggplot(aes(x)) +
                              geom_line(aes(y = y), color = "red") +
                              geom_line(aes(y = y_lower), lty = 2, color = "red") +
                              geom_line(aes(y = y_upper), lty = 2, color = "red") +
                              ggtitle(y)+ theme(axis.text.x=element_text(angle=90)) 
                          }else{
                            myplot = x |> ggplot(aes(x)) +
                              geom_errorbar(aes(y = y, ymin = y_lower, ymax = y_upper),color="red") +
                              ggtitle(y)+ theme(axis.text.x=element_text(angle=90)) 
                          }
                        }
)
)

GAM

Code
plot_structure$plots[[1]] + plot_structure$plots[[2]] + plot_structure$plots[[3]]

GAM

Code
wage_gam_smooth_fit <-
  gam(wage ~ s(year, 4) + s(age, 5) + education,
      data = Wage)

d <- preplot(wage_gam_smooth_fit)

plot_structure=tibble(plot_name=names(d) , data = d |> map(~tibble(x=.$x,y=.$y,se_y=.$se.y) |> 
                                                              mutate(y_lower=y-se_y,
                                                                     y_upper=y+se_y)
)
) |> mutate(plots=map2(.x = data,.y=plot_name,
                        function(x=.x,y=.y){
                          if(!is.factor(x$x)){
                            myplot=x |> ggplot(aes(x)) +
                              geom_line(aes(y = y), color = "blue") +
                              geom_line(aes(y = y_lower), lty = 2, color = "blue") +
                              geom_line(aes(y = y_upper), lty = 2, color = "blue") +
                              ggtitle(y)+ theme(axis.text.x=element_text(angle=90)) 
                          }else{
                            myplot = x |> ggplot(aes(x)) +
                              geom_errorbar(aes(y = y, ymin = y_lower, ymax = y_upper),color="blue") +
                              ggtitle(y)+ theme(axis.text.x=element_text(angle=90)) 
                          }
                        }
)
)

GAM

Code
plot_structure$plots[[1]] + plot_structure$plots[[2]] + plot_structure$plots[[3]]