An Intuitive Explanation of the OLS Estimator for both Traditional and Matrix Algebra

Posted by Mischa Fisher in Econometrics   
Mon 19 October 2015

The Ordinary Least Squares estimator, β^ \hat{\beta} is the first thing one learns in econometrics. It has two forms, one in standard algebra and one in matrix algebra, but it's important to remember the two are equivalent:

β^=cov^(x,y)var(x)=(XX)1XY \hat{\beta} = \frac{\hat{cov}(x,y)}{var(x)} = \mathbf{({X}'X)^{-1}{X}'Y}

I think most students will find it extremely easy to get lost in notation and miss the link to be made with real world data. The following exercise is a helpful way I found to make sure one continues to make the link between traditional 'simple' notation, Matrix Algebra notation, and the underlying data and arithmetic that goes into the ordinary linear regression estimator.

Deriving the Algebraic Notation for the Simple Bivariate Model

The familiar simple bivariate model is expressed as an independent observation as a function of an intercept, a regression coefficient, and an error term (respectively):

yi=b0+b1x1+ei y_{i} = b_{0} + b_{1}x_{1} + e_{i}

Where we wish to minimize the sum of squared errors (SSE):

minimize:SSE=i=1Nei2 minimize: SSE = \sum_{i=1}^{N} e_{i}^{2}

To do so we isolate the error of the regression to make it a function of the other terms:

ei=yib0b1x1 e_{i} = y_{i} - b_{0} - b_{1}x_{1}

Then substitute:

minimize:i=1N(yib0b1x1)2 minimize: \sum_{i=1}^{N} (y_{i} - b_{0} - b_{1}x_{1})^{2}

For our purposes, we'll ignore the derivation of the intercept and take it as a given that it is yˉβ1^xˉ \bar{y} - \hat{\beta_{1}}\bar{x} and just solve for the β^ \hat{\beta} slope coefficient. To minimize the errors, we need to take the partial derivative with respect to b1 b_{1}

SSEb1=b1[i=1N(yib0b1x1)2] \frac{\partial SSE }{\partial b_{1}} = \frac{\partial }{\partial b_{1}} \left [ \sum_{i=1}^{N} (y_{i} - b_{0} - b_{1}x_{1})^{2} \right ]

Move the summation operator through since the the derivative of a sum is equal to the sum of the derivatives:

SSEb1=i=1N[b1(yib0b1x1)2] \frac{\partial SSE }{\partial b_{1}} = \sum_{i=1}^{N} \left [ \frac{\partial }{\partial b_{1}} (y_{i} - b_{0} - b_{1}x_{1})^{2} \right ]

Take the derivative (using the chain rule), then setting it equal to 0 for the first order condition to find the min/max:

SSEb1=2i=1Nxi(yib0b1x1)=0 \frac{\partial SSE }{\partial b_{1}} = -2 \sum_{i=1}^{N} x_{i}(y_{i} - b_{0} - b_{1}x_{1}) = 0

Then multiply by 12 - \frac{1}{2} to simplify:

0=i=1Nxi(yib0b1x1) 0 = \sum_{i=1}^{N} x_{i}(y_{i} - b_{0} - b_{1}x_{1})

Substitute the solution for the intercept, b0 b_{0} , that we took as a given above:

0=i=1Nxi(yi(yˉβ1^xˉ)b1x1) 0 = \sum_{i=1}^{N} x_{i}(y_{i} - (\bar{y} - \hat{\beta_{1}}\bar{x} ) - b_{1}x_{1})

Then rearrange and distribute the summation operator to solve for β1^ \hat{\beta_{1}} :

β1^=i=1N(yiyˉ)xii=1N(xixˉ)xi \hat{\beta_{1}} = \frac{\sum_{i=1}^{N} (y_{i} - \bar{y} )x_{i}}{ \sum_{i=1}^{N} (x_{i} - \bar{x})x_{i} }

Which is algebraically equivalent to:

$$ \frac{\hat{cov}(x,y)}{var(x …