Distributional Parameters

Posted by Mischa Fisher in Econometrics   
Wed 01 March 2017

Part of my thesis involved modeling survival times against parametric distributions, such as the Weibull, log-logistic, and exponential distributions.

One of the fun aspects of distribution theory is seeing how different parameter specifications can make some distributions special forms of other distributions. For today's quick chart, a lead in to the subject by looking at how a couple of commonly used survival analysis parameters resemble one another (with the parameter specifications highlighted in the R code below).

x_lower <- 0
x_upper <- 10
max_height2 <- max(dexp(x_lower:x_upper, rate = 1, log = FALSE), 
       dweibull(x_lower:x_upper, shape = 1, log = FALSE),
       dlogis(x_lower:x_upper, scale = 1, log = FALSE))
ggplot(data.frame(x = c(x_lower, x_upper)), aes(x = x)) + xlim(x_lower, x_upper) + 
 ylim(0, max_height2) +
 stat_function(fun = dexp, args = list(rate = 2), aes(colour = "Exponential")) + 
 stat_function(fun = dweibull, args = list(shape = 2), aes(colour = "Weibull")) + 
 stat_function(fun = dlogis, args = list(scale = 2), aes(colour = "Logistic")) + 
 scale_color_manual("Distribution", values = c("blue", "green", "red")) +
labs(x = "\n x", y = "f(x) \n", 
title = "Common Survival Analysis Distribution Density Plots \n") + 
theme(plot.title = element_text(hjust = 0.5), 
axis.title.x = element_text(face="bold", colour="blue", size = 12),
axis.title.y = element_text(face="bold", colour="blue", size = 12),
legend.title = element_text(face="bold", size = 10),
legend.position = "top") + theme_economist()


A Brief Exercise Illustrating the Central Limit Theorem

Posted by Mischa Fisher in Econometrics   
Thu 26 November 2015

Succinctly, the Central Limit Theorem can be expressed as:

In probability theory, the central limit theorem (CLT) states that, given certain conditions, the arithmetic mean of a sufficiently large number of iterates of independent random variables, each with a well-defined expected value and well-defined variance, will be approximately normally distributed, regardless of the underlying distribution.

If you're an econometrics student, it's likely the first time you're exposed to the CLT is when discussing the OLS estimator in the context of testing hypothesis about the true parameters in order to form confidence intervals: when relying on asymptotics, the normality of the error distributions is not required because as long as the normal 5 Gauss-Markov assumptions are satisfied, the distribution of the OLS estimator will converge to a normal distribution as n goes to infinity.

The CLT is pretty neat and is given short shrift in the context of econometrics, so, here's a brief experiment one can perform in R to illustrate what happens as the theorem comes into effect.

Starting with the Weibull distribution:

plot(sort(rweibull(10000, shape=1)), main="The Weibull Distribution")

We then sample the means of the distribution in increasing replications, then draw histograms from the distributions.

hist(colMeans(replicate(30,rweibull(100,shape=1))),breaks="Scott", xlab="Sample Means", main="Histogram for 30 Replications")
hist(colMeans(replicate(300,rweibull(100,shape=1))),breaks="Scott", xlab="Sample Means", main="Histogram for 300 Replications")
hist(colMeans(replicate(3000,rweibull(100,shape=1))),breaks="Scott", xlab="Sample Means", main="Histogram for 3,000 Replications")
hist(colMeans(replicate(30000,rweibull(100,shape=1))),breaks="Scott", xlab="Sample Means", main="Histogram for 30,000 Replications")
hist(colMeans(replicate(300000,rweibull(100,shape=1))),breaks="Scott", xlab="Sample Means", main="Histogram for 300,000 Replications")
hist(colMeans(replicate(3000000,rweibull(100,shape=1))),breaks="Scott", xlab="Sample Means", main="Histogram for 3,000,000 Replications")

Revealing, a very wonderful .gif :

As the replication size increases, the histogram begins to resemble a normal distribution. Neat!


A friend writes:

You should emphasize more the process that you are taking the MEAN of sub samples of your 'population' which is Weibull distributed. Then, by creating a vector of these means, one is able to show that these "means" converge to a normal distribution as N approaches infinity. As it is, to the 'less' experienced reader perhaps will fail to realize that you take the means of sub samples of that pop'n, and then it is the means which become normally distributed.

Good point; thanks Keith!


An Intuitive Explanation of the OLS Estimator for both Traditional and Matrix Algebra

Posted by Mischa Fisher in Econometrics   
Mon 19 October 2015

The Ordinary Least Squares estimator, β^ \hat{\beta} is the first thing one learns in econometrics. It has two forms, one in standard algebra and one in matrix algebra, but it's important to remember the two are equivalent:

β^=cov^(x,y)var(x)=(XX)1XY \hat{\beta} = \frac{\hat{cov}(x,y)}{var(x)} = \mathbf{({X}'X)^{-1}{X}'Y}

I think most students will find it extremely easy to get lost in notation and miss the link to be made with real world data. The following exercise is a helpful way I found to make sure one continues to make the link between traditional 'simple' notation, Matrix Algebra notation, and the underlying data and arithmetic that goes into the ordinary linear regression estimator.

Deriving the Algebraic Notation for the Simple Bivariate Model

The familiar simple bivariate model is expressed as an independent observation as a function of an intercept, a regression coefficient, and an error term (respectively):

yi=b0+b1x1+ei y_{i} = b_{0} + b_{1}x_{1} + e_{i}

Where we wish to minimize the sum of squared errors (SSE):

minimize:SSE=i=1Nei2 minimize: SSE = \sum_{i=1}^{N} e_{i}^{2}

To do so we isolate the error of the regression to make it a function of the other terms:

ei=yib0b1x1 e_{i} = y_{i} - b_{0} - b_{1}x_{1}

Then substitute:

minimize:i=1N(yib0b1x1)2 minimize: \sum_{i=1}^{N} (y_{i} - b_{0} - b_{1}x_{1})^{2}

For our purposes, we'll ignore the derivation of the intercept and take it as a given that it is yˉβ1^xˉ \bar{y} - \hat{\beta_{1}}\bar{x} and just solve for the β^ \hat{\beta} slope coefficient. To minimize the errors, we need to take the partial derivative with respect to b1 b_{1}

SSEb1=b1[i=1N(yib0b1x1)2] \frac{\partial SSE }{\partial b_{1}} = \frac{\partial }{\partial b_{1}} \left [ \sum_{i=1}^{N} (y_{i} - b_{0} - b_{1}x_{1})^{2} \right ]

Move the summation operator through since the the derivative of a sum is equal to the sum of the derivatives:

SSEb1=i=1N[b1(yib0b1x1)2] \frac{\partial SSE }{\partial b_{1}} = \sum_{i=1}^{N} \left [ \frac{\partial }{\partial b_{1}} (y_{i} - b_{0} - b_{1}x_{1})^{2} \right ]

Take the derivative (using the chain rule), then setting it equal to 0 for the first order condition to find the min/max:

SSEb1=2i=1Nxi(yib0b1x1)=0 \frac{\partial SSE }{\partial b_{1}} = -2 \sum_{i=1}^{N} x_{i}(y_{i} - b_{0} - b_{1}x_{1}) = 0

Then multiply by 12 - \frac{1}{2} to simplify:

0=i=1Nxi(yib0b1x1) 0 = \sum_{i=1}^{N} x_{i}(y_{i} - b_{0} - b_{1}x_{1})

Substitute the solution for the intercept, b0 b_{0} , that we took as a given above:

0=i=1Nxi(yi(yˉβ1^xˉ)b1x1) 0 = \sum_{i=1}^{N} x_{i}(y_{i} - (\bar{y} - \hat{\beta_{1}}\bar{x} ) - b_{1}x_{1})

Then rearrange and distribute the summation operator to solve for β1^ \hat{\beta_{1}} :

β1^=i=1N(yiyˉ)xii=1N(xixˉ)xi \hat{\beta_{1}} = \frac{\sum_{i=1}^{N} (y_{i} - \bar{y} )x_{i}}{ \sum_{i=1}^{N} (x_{i} - \bar{x})x_{i} }

Which is algebraically equivalent to:

$$ \frac{\hat{cov}(x,y)}{var(x …