# An Intuitive Explanation of the OLS Estimator for both Traditional and Matrix Algebra

The Ordinary Least Squares estimator, $\hat{\beta}$ is the first thing one learns in econometrics. It has two forms, one in standard algebra and one in matrix algebra, but it's important to remember the two are equivalent:

### $\hat{\beta} = \frac{\hat{cov}(x,y)}{var(x)} = \mathbf{({X}'X)^{-1}{X}'Y}$

I think most students will find it extremely easy to get lost in notation and miss the link to be made with real world data. The following exercise is a helpful way I found to make sure one continues to make the link between traditional 'simple' notation, Matrix Algebra notation, and the underlying data and arithmetic that goes into the ordinary linear regression estimator.

### Deriving the Algebraic Notation for the Simple Bivariate Model

The familiar simple bivariate model is expressed as an independent observation as a function of an intercept, a regression coefficient, and an error term (respectively):

$y_{i} = b_{0} + b_{1}x_{1} + e_{i}$

Where we wish to minimize the sum of squared errors (SSE):

$minimize: SSE = \sum_{i=1}^{N} e_{i}^{2}$

To do so we isolate the error of the regression to make it a function of the other terms:

$e_{i} = y_{i} - b_{0} - b_{1}x_{1}$

Then substitute:

$minimize: \sum_{i=1}^{N} (y_{i} - b_{0} - b_{1}x_{1})^{2}$

For our purposes, we'll ignore the derivation of the intercept and take it as a given that it is $\bar{y} - \hat{\beta_{1}}\bar{x}$ and just solve for the $\hat{\beta}$ slope coefficient. To minimize the errors, we need to take the partial derivative with respect to $b_{1}$

$\frac{\partial SSE }{\partial b_{1}} = \frac{\partial }{\partial b_{1}} \left [ \sum_{i=1}^{N} (y_{i} - b_{0} - b_{1}x_{1})^{2} \right ]$

Move the summation operator through since the the derivative of a sum is equal to the sum of the derivatives:

$\frac{\partial SSE }{\partial b_{1}} = \sum_{i=1}^{N} \left [ \frac{\partial }{\partial b_{1}} (y_{i} - b_{0} - b_{1}x_{1})^{2} \right ]$

Take the derivative (using the chain rule), then setting it equal to 0 for the first order condition to find the min/max:

$\frac{\partial SSE }{\partial b_{1}} = -2 \sum_{i=1}^{N} x_{i}(y_{i} - b_{0} - b_{1}x_{1}) = 0$

Then multiply by $- \frac{1}{2}$ to simplify:

$0 = \sum_{i=1}^{N} x_{i}(y_{i} - b_{0} - b_{1}x_{1})$

Substitute the solution for the intercept, $b_{0}$ , that we took as a given above:

$0 = \sum_{i=1}^{N} x_{i}(y_{i} - (\bar{y} - \hat{\beta_{1}}\bar{x} ) - b_{1}x_{1})$

Then rearrange and distribute the summation operator to solve for $\hat{\beta_{1}}$ :

$\hat{\beta_{1}} = \frac{\sum_{i=1}^{N} (y_{i} - \bar{y} )x_{i}}{ \sum_{i=1}^{N} (x_{i} - \bar{x})x_{i} }$

Which is algebraically equivalent to:

$$ \frac{\hat{cov}(x,y)}{var(x …