Linear regression

Statistical Learning

Alfonso Iodice D’Enza

$y=f(X)+\epsilon$

this is what is all about…

$y$$\ =f(X)+\epsilon$

the response one is interested in

$y=f($$X$$)+\epsilon$

the features that are used to estimate $y$

$y=\ $ $f($$X$$)$+$\epsilon$

the function that links the $X$’s to $y$

$y=f(X)$+$\epsilon$

the error: the $X$’s cannot explain $y$ exactly

$y=f(X)+\epsilon$

the goal: find $\hat{f}(\cdot)$ to estimate $f(\cdot)$

What to do, and how

$y$

regression: if the response is numeric

classification: if the response is non numeric (categorical)

goal

inference: study the effects of $X$’s on $y$

prediction: estimate $y$, given new values of the $X$’s

fit strategy

parametric: assume a functional form for $f(\cdot)$ and fit its parameters

non-parametric: fit $f(\cdot)$ (almost) point-by-point.

basics: linear regression

guess $y$? (no $X$ available)

Code

set.seed(123)
toy_data = tibble(y = runif(30,min=0,max = 10),x_no=0,x_yes = ((y-3)/3)-rnorm(30,sd=.5)) |> 
  pivot_longer(cols = x_no : x_yes, values_to = "x",names_to="states")

toy_data |> filter(states == "x_no") |> ggplot(aes(y=y,x=x)) + 
  geom_point(color="indianred",size=3,alpha=.75) + 
  expand_limits(y=c(0,10),x=c(-2,3)) + geom_hline(yintercept = mean(toy_data$y),color="forestgreen")

guess $y$? ($X$ available)

Code

toy_data |> ggplot(aes(y=y,x=x))+
  geom_point(color="indianred",size=3,alpha=.75)+
  expand_limits(y=c(0,10),x=c(-2,3)) + 
  geom_hline(yintercept = mean(toy_data$y),color="forestgreen") +
  transition_states(states)

guess $y$? ($X$ available)

Code

toy_data |> filter(states == "x_yes") |> ggplot(aes(y=y,x=x)) + 
  geom_point(color="indianred",size=3,alpha=.75)  + geom_smooth(method = "lm") + 
  geom_hline(yintercept = mean(toy_data$y),color="forestgreen")+
  expand_limits(y=c(0,10),x=c(-2,3))

$y$$\ =f(X)+\epsilon$

the response is a numeric (continuous) variable

$y=f($$X$$)+\epsilon$

the features can be numeric and categorical

$y=\ $ $f($$X$$)$+$\epsilon$

the function is linear: $y=\beta_{0}+\sum_{j=1}^{p}\beta_{j}X_{j}+\epsilon$

$y=f(X)$+$\epsilon$

the error is assumed to be $\epsilon_{i}\sim N(0,\sigma^{2})$ and $cov(\epsilon_{i},\epsilon_{i'})$

$y=f(X)+\epsilon$

the goal: find $\hat{f}(\cdot)$ boils down to find $\hat{\beta}_{0},\hat{\beta}_{j}'s$

prologue

stuff you might already know

end of prologue

Advertising data

Code

adv_data = read_csv(file="./data/Advertising.csv") |> select(-1)
adv_data |> slice_sample(n = 8) |> 
  kbl() |> kable_styling(font_size=10)

TV	radio	newspaper	sales
76.4	0.8	14.8	9.4
66.9	11.7	36.8	9.7
265.6	20.0	0.3	17.4
151.5	41.3	58.5	18.5
23.8	35.1	65.9	9.2
237.4	27.5	11.0	18.9
7.8	38.9	50.6	6.6
197.6	23.3	14.2	16.6

sales is the response, indicating the level of sales in a specific market
TV , Radio and Newspaper are the predictors, indicating the advertising budget spent on the corresponding media

Advertising data

Code

adv_data_tidy = adv_data |> pivot_longer(names_to="medium",values_to="budget",cols = 1:3) 
adv_data_tidy |> slice_sample(n = 8) |> 
  kbl() |> kable_styling(font_size=10)

sales	medium	budget
14.8	radio	10.1
19.4	radio	33.5
10.5	radio	16.0
11.8	newspaper	23.7
11.7	newspaper	5.9
19.2	TV	193.7
16.9	TV	163.3
16.6	TV	202.5

Advertising data

Code

adv_data_tidy |> 
  ggplot(aes(x = budget, y = sales)) + theme_minimal() + facet_wrap(~medium,scales = "free") +
  geom_point(color="indianred",alpha=.5,size=3)

Note: the budget spent on TV is up to 300, less so for Radio and Newspaper

Advertising data: single regressions

We can be naive and regress sales on tv, newspaper and radio, separately.

Advertising data: single regressions

Code

avd_lm_models_nest = adv_data_tidy |> 
                  group_by(medium) |> 
                  group_nest(.key = "datasets") |> 
                  mutate(
                    model_output = map(.x=datasets,~lm(sales~budget,data=.x)),
                         model_params = map(.x=model_output, ~tidy(.x)),
                         model_metrics = map(.x=model_output, ~glance(.x)),
                         model_fitted = map(.x=model_output, ~augment(.x))
                  )

# A tibble: 3 × 2
  medium              datasets
  <chr>     <list<tibble[,2]>>
1 TV                 [200 × 2]
2 newspaper          [200 × 2]
3 radio              [200 × 2]

Advertising data: single regressions

Code

avd_lm_models_nest = adv_data_tidy |> 
                  group_by(medium) |> 
                  group_nest(.key = "datasets") |> 
                  mutate(
                    model_output = map(.x=datasets,~lm(sales~budget,data=.x)),
                         model_params = map(.x=model_output, ~tidy(.x)),
                         model_metrics = map(.x=model_output, ~glance(.x)),
                         model_fitted = map(.x=model_output, ~augment(.x))
                  )

# A tibble: 3 × 3
  medium              datasets model_output
  <chr>     <list<tibble[,2]>> <list>      
1 TV                 [200 × 2] <lm>        
2 newspaper          [200 × 2] <lm>        
3 radio              [200 × 2] <lm>

Advertising data: single regressions

Code

avd_lm_models_nest = adv_data_tidy |> 
                  group_by(medium) |> 
                  group_nest(.key = "datasets") |> 
                  mutate(
                    model_output = map(.x=datasets,~lm(sales~budget,data=.x)),
                         model_params = map(.x=model_output, ~tidy(.x)),
                         model_metrics = map(.x=model_output, ~glance(.x)),
                         model_fitted = map(.x=model_output, ~augment(.x))
                  )

# A tibble: 3 × 4
  medium              datasets model_output model_params    
  <chr>     <list<tibble[,2]>> <list>       <list>          
1 TV                 [200 × 2] <lm>         <tibble [2 × 5]>
2 newspaper          [200 × 2] <lm>         <tibble [2 × 5]>
3 radio              [200 × 2] <lm>         <tibble [2 × 5]>

Advertising data: single regressions

Code

avd_lm_models_nest = adv_data_tidy |> 
                  group_by(medium) |> 
                  group_nest(.key = "datasets") |> 
                  mutate(
                    model_output = map(.x=datasets,~lm(sales~budget,data=.x)),
                         model_params = map(.x=model_output, ~tidy(.x)),
                         model_metrics = map(.x=model_output, ~glance(.x)),
                         model_fitted = map(.x=model_output, ~augment(.x))
                  )

# A tibble: 3 × 5
  medium              datasets model_output model_params     model_metrics    
  <chr>     <list<tibble[,2]>> <list>       <list>           <list>           
1 TV                 [200 × 2] <lm>         <tibble [2 × 5]> <tibble [1 × 12]>
2 newspaper          [200 × 2] <lm>         <tibble [2 × 5]> <tibble [1 × 12]>
3 radio              [200 × 2] <lm>         <tibble [2 × 5]> <tibble [1 × 12]>

Advertising data: single regressions

Code

avd_lm_models_nest = adv_data_tidy |> 
                  group_by(medium) |> 
                  group_nest(.key = "datasets") |> 
                  mutate(
                    model_output = map(.x=datasets,~lm(sales~budget,data=.x)),
                         model_params = map(.x=model_output, ~tidy(.x)),
                         model_metrics = map(.x=model_output, ~glance(.x)),
                         model_fitted = map(.x=model_output, ~augment(.x))
                  )

# A tibble: 3 × 6
  medium           datasets model_output model_params model_metrics model_fitted
  <chr>     <list<tibble[,> <list>       <list>       <list>        <list>      
1 TV              [200 × 2] <lm>         <tibble>     <tibble>      <tibble>    
2 newspaper       [200 × 2] <lm>         <tibble>     <tibble>      <tibble>    
3 radio           [200 × 2] <lm>         <tibble>     <tibble>      <tibble>

This is a nested data structure with everything stored. The functions tidy, glance and augment pull all the information off of the model output, and arrange it in a tidy way.

Advertising data: nested structure

The quantities nested in the tibble can be pulled out, or they can be expanded within the tibble itself, using unnest

Code

avd_lm_models_nest  |> unnest(model_params)

# A tibble: 6 × 10
  medium       datasets model_output term  estimate std.error statistic  p.value
  <chr>     <list<tibb> <list>       <chr>    <dbl>     <dbl>     <dbl>    <dbl>
1 TV          [200 × 2] <lm>         (Int…   7.03     0.458       15.4  1.41e-35
2 TV          [200 × 2] <lm>         budg…   0.0475   0.00269     17.7  1.47e-42
3 newspaper   [200 × 2] <lm>         (Int…  12.4      0.621       19.9  4.71e-49
4 newspaper   [200 × 2] <lm>         budg…   0.0547   0.0166       3.30 1.15e- 3
5 radio       [200 × 2] <lm>         (Int…   9.31     0.563       16.5  3.56e-39
6 radio       [200 × 2] <lm>         budg…   0.202    0.0204       9.92 4.35e-19
# ℹ 2 more variables: model_metrics <list>, model_fitted <list>

Advertising data: nested structure

The quantities nested in the tibble can be pulled out, or they can be expanded within the tibble itself, using unnest

Code

avd_lm_models_nest  |> unnest(model_params) |> 
  select(medium,term:p.value) |>
  kbl(digits = 4) |> kable_styling(font_size = 10) |> 
  row_spec(1:2,background = "#D9FDEC") |> 
  row_spec(3:4,background = "#CAF6FC") |> 
  row_spec(5:6,background = "#FDB5BA")

medium	term	estimate	std.error	statistic	p.value
TV	(Intercept)	7.0326	0.4578	15.3603	0.0000
TV	budget	0.0475	0.0027	17.6676	0.0000
newspaper	(Intercept)	12.3514	0.6214	19.8761	0.0000
newspaper	budget	0.0547	0.0166	3.2996	0.0011
radio	(Intercept)	9.3116	0.5629	16.5422	0.0000
radio	budget	0.2025	0.0204	9.9208	0.0000

The effect of budget on sales differs significantly from 0, for all the considered media

Ordinary least squares

\[\min_{\hat{\beta}_{0},\hat{\beta}_{1}}: RSS=\sum_{i=1}^{n}{e_{i}^{2}}=\sum_{i=1}^{n}{(y_{i}-\hat{y}_{i})^{2}}=\sum_{i=1}^{n}{(y_{i}-\hat{\beta}_{0}-\hat{\beta}_{1}x_{i})^{2}}\]

OLS estimator of $\beta_{0}$

\[\begin{split}&\partial_{\hat{\beta}_{0}}\sum_{i=1}^{n}{(y_{i}-\hat{\beta}_{0}-\hat{\beta}_{1}x_{i})^{2}}= -2\sum_{i=1}^{n}{(y_{i}-\hat{\beta}_{0}-\hat{\beta}_{1}x_{i})}=\\ &\sum_{i=1}^{n}{y_{i}}-n\hat{\beta}_{0}-\hat{\beta}_{1}\sum_{i=1}^{n}{x_{i}}=0\rightarrow \hat{\beta}_{0}=\bar{y}-\hat{\beta}_{1}\bar{x}\end{split}\]

Ordinary least squares

\[\min_{\hat{\beta}_{0},\hat{\beta}_{1}}\sum_{i=1}^{n}{e_{i}^{2}}=\sum_{i=1}^{n}{(y_{i}-\hat{y}_{i})^{2}}=\sum_{i=1}^{n}{(y_{i}-\hat{\beta}_{0}-\hat{\beta}_{1}x_{i})^{2}}\]

OLS estimator of $\beta_{1}$

\[\begin{split}&\partial_{\hat{\beta}_{1}}\sum_{i=1}^{n}{(y_{i}-\hat{\beta}_{0}-\hat{\beta}_{1}x_{i})^{2}}= -2\sum_{i=1}^{n}{x_{i}(y_{i}-\hat{\beta}_{0}-\hat{\beta}_{1}x_{i})}= \sum_{i=1}^{n}{x_{i}y_{i}}-\hat{\beta}_{0}\sum_{i=1}^{n}{x_{i}}-\hat{\beta}_{1}\sum_{i=1}^{n}{x_{i}^{2}}=0\\ &\hat{\beta}_{1}\sum_{i=1}^{n}{x_{i}^{2}}=\sum_{i=1}^{n}{x_{i}y_{i}}-\sum_{i=1}^{n}{x_{i}}\left(\frac{\sum_{i=1}^{n}{y_{i}}}{n}-\hat{\beta}_{1}\frac{\sum_{i=1}^{n}{x_{i}}}{n}\right)\\ &\hat{\beta}_{1}\left(n\sum_{i=1}^{n}{x_{i}^{2}}-(\sum_{i=1}^{n}{x_{i}})^{2} \right)= n\sum_{i=1}^{n}{x_{i}y_{i}}-\sum_{i=1}^{n}{x_{i}}\sum_{i=1}^{n}{y_{i}}\\ &\hat{\beta}_{1}=\frac{n\sum_{i=1}^{n}{x_{i}y_{i}}-\sum_{i=1}^{n}{x_{i}}\sum_{i=1}^{n}{y_{i}}} {n\sum_{i=1}^{n}{x_{i}^{2}}-(\sum_{i=1}^{n}{x_{i}})^{2} }=\frac{\sigma_{xy}}{\sigma^{2}_{x}} \end{split}\]

Three regression lines

Code

three_regs_plot=avd_lm_models_nest  |> unnest(model_fitted) |> 
  select(medium,sales:.fitted) |> 
  ggplot(aes(x=budget,y=sales))+theme_minimal()+facet_grid(~medium,scales="free")+
  geom_point(alpha=.5,color = "indianred")+
  geom_smooth(method="lm")+
  geom_segment(aes(x=budget,xend=budget,y=.fitted,yend=sales),color="forestgreen",alpha=.25)

Three regression lines

Code

three_regs_plot = avd_lm_models_nest  |> unnest(model_fitted) |> 
  select(medium,sales:.fitted) |> 
  ggplot(aes(x=budget,y=sales))+theme_minimal()+facet_grid(~medium,scales="free")+
  geom_point(alpha=.5,color = "indianred")+
  geom_smooth(method="lm")+
  geom_segment(aes(x=budget,xend=budget,y=.fitted,yend=sales),color="forestgreen",alpha=.25)

Three regression lines

Code

three_regs_plot=avd_lm_models_nest  |> unnest(model_fitted) |> 
  select(medium,sales:.fitted) |> 
  ggplot(aes(x=budget,y=sales))+theme_minimal()+facet_grid(~medium,scales="free")+
  geom_point(alpha=.5,color = "indianred")+
  geom_smooth(method="lm")+ 
  geom_segment(aes(x=budget,xend=budget,y=.fitted,yend=sales),color="forestgreen",alpha=.25)

Three regression lines

Code

three_regs_plot=avd_lm_models_nest  |> unnest(model_fitted) |> 
  select(medium,sales:.fitted) |> 
  ggplot(aes(x=budget,y=sales))+theme_minimal()+facet_grid(~medium,scales="free")+
  geom_point(alpha=.5,color = "indianred")+
  geom_smooth(method="lm")+ 
  geom_segment(aes(x=budget,xend=budget,y=.fitted,yend=sales),color="forestgreen",alpha=.25)

Advertising data: nested structure

The quantities nested in the tibble can be pulled out, or they can be expanded within the tibble itself, using $\texttt{unnest}$

Code

avd_lm_models_nest  |> unnest(model_metrics) |> 
  select(medium,r.squared:df.residual) |>
  kbl(digits = 4) |> kable_styling(font_size = 10) |> 
  row_spec(1,background = "#D9FDEC") |> 
  row_spec(2,background = "#CAF6FC") |> 
  row_spec(3,background = "#FDB5BA")

medium	r.squared	adj.r.squared	sigma	statistic	p.value	df	logLik	AIC	BIC	deviance	df.residual
TV	0.6119	0.6099	3.2587	312.1450	0.0000	1	-519.0457	1044.091	1053.986	2102.531	198
newspaper	0.0521	0.0473	5.0925	10.8873	0.0011	1	-608.3357	1222.671	1232.566	5134.805	198
radio	0.3320	0.3287	4.2749	98.4216	0.0000	1	-573.3369	1152.674	1162.569	3618.479	198

Linear regression: model assumptions

The linear regression model is

\[y_{i}=\beta_{0}+\beta_{1}x_{1}+\epsilon_{i}\]

where $\epsilon_{i}$ is a random variable with expected value of 0. For inference, more assumptions must me made:

$\epsilon_{i}\sim N(0,\sigma^{2})$ , then the variance of the errors $\sigma^{2}$ does not depend on $x_{i}$ .
$Cov(\epsilon_{i},\epsilon_{i'})=0$ , for all $i\neq i'$ and $i,i'=1,\ldots,n$.
$x_{i}$ non stochastic

It follows that $y_{i}$ is a random variable such that

$E[y_{i}]=\beta_{0}+\beta_{1}x_{1}$
$Var[y_{i}]=\sigma^{2}$

Linear regression: model assumptions

Statistics for Business and Economics (Anderson, Sweeney and Williams, (2011))

$\sigma^{2}$ estimator

The variance $\sigma^{2}$ is assumed to be constant, but it is unknown, and it has to be estimated:

since $y_{i}\sim N(\beta_{0}+\beta_{1}x_{i},\sigma^{2})$ , it follows that

\[\frac{y_{i}-\beta_{0}-\beta_{1}x_{i}}{\sigma}\sim N(0,1)\]

furthermore, recall that $\sum_{i=1}^{n}\left[N(0,1)\right]^{2} = \chi^{2}_{n}$ , then

\[\sum_{i=1}^{n}\left[N(0,1)\right]^{2}=\sum_{i=1}^{n}\left[\frac{y_{i}-\beta_{0}-\beta_{1}x_{i}}{\sigma}\right]^{2}=\chi^{2}_{n}\]

replacing $\beta_{0}$ and $\beta_{1}$ with their estimators $\hat{\beta}_{0}$ and $\hat{\beta}_{1}$ , the previous becomes

\[\frac{\sum_{i=1}^{n}\left(y_{i}-\hat{\beta}_{0}-\hat{\beta}_{1}x_{i}\right)^{2}}{\sigma^{2}}=\frac{RSS}{\sigma^{2}}=\chi^{2}_{n-2}\] two degree of freedom are lost as the two parameters are replaced by their estimators.

$\sigma^{2}$ estimator

Finally, since $E\left[\chi^{2}_{n-2}\right]=n-2$ then

\[E\left[\chi^{2}_{n-2}\right]=E\left[\frac{RSS}{\sigma^{2}}\right]=n-2\]

and because $\sigma^{2}$ is constant, it can be pulled out from the expectation, and re-write

\[\frac{E\left[RSS\right]}{\sigma^{2}}=n-2\]

it follows that \[\sigma^{2}=E\left[\frac{RSS}{n-2}\right]\] and $\frac{RSS}{n-2}$ is an unbiased estimator of $\sigma^{2}$.
$\sqrt{\frac{RSS}{n-2}}=RSE$ , the so-called residual standard error

$\hat{\beta}_{1}$ as a linear combination of $y_{i}$

\[\begin{split} \hat{\beta}_{1}&= \frac{\sum_{i=1}^{n}{\left(y_{i}-\bar{y}\right)\left(x_{i}-\bar{x}\right)} }{\sum_{i=1}^{n}{\left(x_{i}-\bar{x}\right)^{2}}} =\frac{\sum_{i=1}^{n}{\left[y_{i} \left(x_{i}-\bar{x}\right) -\bar{y}\left(x_{i}-\bar{x}\right)\right]} }{\sum_{i=1}^{n}{\left(x_{i}-\bar{x}\right)^{2}}}=\\ &=\frac{\sum_{i=1}^{n}{y_{i} \left(x_{i}-\bar{x}\right)} -\bar{y}\sum_{i=1}^{n}{\left(x_{i}-\bar{x}\right)}}{\sum_{i=1}^{n}{\left(x_{i}-\bar{x}\right)^{2}}} = \sum_{i=1}^{n}{y_{i} \frac{\left(x_{i}-\bar{x}\right)}{\sum_{i=1}^{n}{\left(x_{i}-\bar{x}\right)^{2}}}}= \sum_{i=1}^{n}{w_{i}y_{i}}\end{split}\]

Given a linear combination $U_{i}=c+d V_{i}$, if $V_{i}\sim N(\mu_{v},\sigma^{2}_{v})$, then $U_{i}\sim N(c+d\mu_{v},d^{2}\sigma^{2}_{v})$.

Since $Y_{i}\sim N(\beta_{0}+\beta_{1}x_{i},\sigma^{2})$, and $\hat{\beta}_{1}=\sum_{i=1}^{n}{w_{i}y_{i}}$ then $c=0$ and $d=w_{i}$, then

\[\hat{\beta}_{1}\sim N(\sum_{i=1}^{n}{w_{i}(\beta_{0}+\beta_{1}x_{i})},\sum_{i=1}^{n}{w_{i}^{2}\sigma^{2}})\]

Parameters of the $\hat{\beta}_{1}$ distribution: $E\left[\hat{\beta}_{1}\right]=\beta_{1}$

The expectation of $\hat{\beta}_{1}$ is

$E\left[\hat{\beta}_{1}\right] = \sum_{i=1}^{n}{w_{i}(\beta_{0}+\beta_{1}x_{i})} = \beta_{0}\underbrace{\sum_{i=1}^{n}{w_{i}}}_{0}+\beta_{1}\sum_{i=1}^{n}{w_{i}x_{i}}=\beta_{1}\sum_{i=1}^{n}{\frac{x_{i}-\bar{x}}{\sum_{i=1}^{n}(x_{i}-\bar{x})^{2}}x_{i}}=$

$=\beta_{1}\sum_{i=1}^{n}{\frac{x_{i}-\bar{x}}{\sum_{i=1}^{n}(x_{i}-\bar{x})^{2}}x_{i}}=$

Parameters of the $\hat{\beta}_{1}$ distribution: $E\left[\hat{\beta}_{1}\right]=\beta_{1}$

The expectation of $\hat{\beta}_{1}$ is

$E\left[\hat{\beta}_{1}\right] = \sum_{i=1}^{n}{w_{i}(\beta_{0}+\beta_{1}x_{i})} = \beta_{0}\underbrace{\sum_{i=1}^{n}{w_{i}}}_{0}+\beta_{1}\sum_{i=1}^{n}{w_{i}x_{i}}=\beta_{1}\sum_{i=1}^{n}{\frac{x_{i}-\bar{x}}{\sum_{i=1}^{n}(x_{i}-\bar{x})^{2}}x_{i}}=$

$=\beta_{1}\sum_{i=1}^{n}{\frac{x_{i}-\bar{x}}{\sum_{i=1}^{n}(x_{i}-\bar{x})^{2}}x_{i}}=\beta_{1}\frac{\sum_{i=1}^{n}{(x_{i}-\bar{x})x_{i}}}{\sum_{i=1}^{n}(x_{i}-\bar{x})^{2}}=$

Parameters of the $\hat{\beta}_{1}$ distribution: $E\left[\hat{\beta}_{1}\right]=\beta_{1}$

The expectation of $\hat{\beta}_{1}$ is

$E\left[\hat{\beta}_{1}\right] = \sum_{i=1}^{n}{w_{i}(\beta_{0}+\beta_{1}x_{i})} = \beta_{0}\underbrace{\sum_{i=1}^{n}{w_{i}}}_{0}+\beta_{1}\sum_{i=1}^{n}{w_{i}x_{i}}=\beta_{1}\sum_{i=1}^{n}{\frac{x_{i}-\bar{x}}{\sum_{i=1}^{n}(x_{i}-\bar{x})^{2}}x_{i}}=$

$=\beta_{1}\sum_{i=1}^{n}{\frac{x_{i}-\bar{x}}{\sum_{i=1}^{n}(x_{i}-\bar{x})^{2}}x_{i}}=\beta_{1}\frac{\sum_{i=1}^{n}{(x_{i}-\bar{x})x_{i}}}{\sum_{i=1}^{n}(x_{i}-\bar{x})^{2}}=\beta_{1}\frac{\sum_{i=1}^{n}{x_{i}^2}-\bar{x}\sum_{i=1}^{n}{x_{i}}}{\sum_{i=1}^{n}(x_{i}-\bar{x})^{2}}$

Parameters of the $\hat{\beta}_{1}$ distribution: $E\left[\hat{\beta}_{1}\right]=\beta_{1}$

The expectation of $\hat{\beta}_{1}$ is

$E\left[\hat{\beta}_{1}\right] = \sum_{i=1}^{n}{w_{i}(\beta_{0}+\beta_{1}x_{i})} = \beta_{0}\underbrace{\sum_{i=1}^{n}{w_{i}}}_{0}+\beta_{1}\sum_{i=1}^{n}{w_{i}x_{i}}=\beta_{1}\sum_{i=1}^{n}{\frac{x_{i}-\bar{x}}{\sum_{i=1}^{n}(x_{i}-\bar{x})^{2}}x_{i}}=$

$=\beta_{1}\sum_{i=1}^{n}{\frac{x_{i}-\bar{x}}{\sum_{i=1}^{n}(x_{i}-\bar{x})^{2}}x_{i}}=\beta_{1}\frac{\sum_{i=1}^{n}{(x_{i}-\bar{x})x_{i}}}{\sum_{i=1}^{n}(x_{i}-\bar{x})^{2}}=\beta_{1}\frac{\sum_{i=1}^{n}{x_{i}^2}-\bar{x}\sum_{i=1}^{n}{x_{i}}}{\sum_{i=1}^{n}(x_{i}-\bar{x})^{2}}=\beta_{1}\frac{1}{\sum_{i=1}^{n}(x_{i}-\bar{x})^{2}}\sum_{i=1}^{n}{x_{i}^2}-\frac{\sum_{i=1}^{n}{x_{i}}}{n}\sum_{i=1}^{n}{x_{i}}=$

Parameters of the $\hat{\beta}_{1}$ distribution: $E\left[\hat{\beta}_{1}\right]=\beta_{1}$

The expectation of $\hat{\beta}_{1}$ is

$E\left[\hat{\beta}_{1}\right] = \sum_{i=1}^{n}{w_{i}(\beta_{0}+\beta_{1}x_{i})} = \beta_{0}\underbrace{\sum_{i=1}^{n}{w_{i}}}_{0}+\beta_{1}\sum_{i=1}^{n}{w_{i}x_{i}}=\beta_{1}\sum_{i=1}^{n}{\frac{x_{i}-\bar{x}}{\sum_{i=1}^{n}(x_{i}-\bar{x})^{2}}x_{i}}=$

$=\beta_{1}\sum_{i=1}^{n}{\frac{x_{i}-\bar{x}}{\sum_{i=1}^{n}(x_{i}-\bar{x})^{2}}x_{i}}=\beta_{1}\frac{\sum_{i=1}^{n}{(x_{i}-\bar{x})x_{i}}}{\sum_{i=1}^{n}(x_{i}-\bar{x})^{2}}=\beta_{1}\frac{\sum_{i=1}^{n}{x_{i}^2}-\bar{x}\sum_{i=1}^{n}{x_{i}}}{\sum_{i=1}^{n}(x_{i}-\bar{x})^{2}}=\beta_{1}\frac{1}{\sum_{i=1}^{n}(x_{i}-\bar{x})^{2}}\sum_{i=1}^{n}{x_{i}^2}-\frac{\sum_{i=1}^{n}{x_{i}}}{n}\sum_{i=1}^{n}{x_{i}}=$

$=\beta_{1}\frac{1}{\sum_{i=1}^{n}(x_{i}-\bar{x})^{2}}\frac{n\sum_{i=1}^{n}{x_{i}^2}-\left[\sum_{i=1}^{n}{x_{i}}\right]^2}{n}=$

Note: $\frac{n\sum_{i=1}^{n}{x_{i}^2}-\left[\sum_{i=1}^{n}{x_{i}}\right]^2}{n} \frac{1}{n} =\frac{n\sum_{i=1}^{n}{x_{i}^2}}{n^{2}}-\frac{\left[\sum_{i=1}^{n}{x_{i}}\right]^2}{n^{2}}=\frac{\sum_{i=1}^{n}{x_{i}^2}}{n}-\bar{x}^{2}=var(x)$

therefore $\frac{n\sum_{i=1}^{n}{x_{i}^2}-\left[\sum_{i=1}^{n}{x_{i}}\right]^2}{n} = var(x)n=\sum_{i=1}^{n}(x_{i}-\bar{x})^{2}$

$=\beta_{1}\frac{1}{\sum_{i=1}^{n}(x_{i}-\bar{x})^{2}}\frac{n\sum_{i=1}^{n}{x_{i}^2}-\left[\sum_{i=1}^{n}{x_{i}}\right]^2}{n}=\beta_{1}\frac{1}{\sum_{i=1}^{n}(x_{i}-\bar{x})^{2}} \sum_{i=1}^{n}(x_{i}-\bar{x})^{2}=\beta_{1}$