Distributional Parameters

Posted by Mischa Fisher in Econometrics   
Wed 01 March 2017

Part of my thesis involved modeling survival times against parametric distributions, such as the Weibull, log-logistic, and exponential distributions.

One of the fun aspects of distribution theory is seeing how different parameter specifications can make some distributions special forms of other distributions. For today's quick chart, a lead in to the subject by looking at how a couple of commonly used survival analysis parameters resemble one another (with the parameter specifications highlighted in the R code below).

x_lower <- 0
x_upper <- 10
max_height2 <- max(dexp(x_lower:x_upper, rate = 1, log = FALSE), 
       dweibull(x_lower:x_upper, shape = 1, log = FALSE),
       dlogis(x_lower:x_upper, scale = 1, log = FALSE))
ggplot(data.frame(x = c(x_lower, x_upper)), aes(x = x)) + xlim(x_lower, x_upper) + 
 ylim(0, max_height2) +
 stat_function(fun = dexp, args = list(rate = 2), aes(colour = "Exponential")) + 
 stat_function(fun = dweibull, args = list(shape = 2), aes(colour = "Weibull")) + 
 stat_function(fun = dlogis, args = list(scale = 2), aes(colour = "Logistic")) + 
 scale_color_manual("Distribution", values = c("blue", "green", "red")) +
labs(x = "\n x", y = "f(x) \n", 
title = "Common Survival Analysis Distribution Density Plots \n") + 
theme(plot.title = element_text(hjust = 0.5), 
axis.title.x = element_text(face="bold", colour="blue", size = 12),
axis.title.y = element_text(face="bold", colour="blue", size = 12),
legend.title = element_text(face="bold", size = 10),
legend.position = "top") + theme_economist()

Read more...


Distributions and Their Parameters

Posted by Mischa Fisher in Econometrics   
Tue 01 March 2016

Distribution theory

In probability theory, the central limit theorem (CLT) states that, given certain conditions, the arithmetic mean of a sufficiently large number of iterates of independent random variables, each with a well-defined expected value and well-defined variance, will be approximately normally distributed, regardless of the underlying distribution.

If you're an econometrics student, it's likely the first time you're exposed to the CLT is when discussing the OLS estimator in the context of testing hypothesis about the true parameters in order to form confidence intervals: when relying on asymptotics, the normality of the error distributions is not required because as long as the normal 5 Gauss-Markov assumptions are satisfied, the distribution of the OLS estimator will converge to a normal distribution as n goes to infinity.

The CLT is pretty neat and is given short shrift in the context of econometrics, so, here's a brief experiment one can perform in R to illustrate what happens as the theorem comes into effect.

Starting with the Weibull distribution:

plot(sort(rweibull(10000, shape=1)), main="The Weibull Distribution")

We then sample the means of the distribution in increasing replications, then draw histograms from the distributions.

hist(colMeans(replicate(30,rweibull(100,shape=1))),breaks="Scott", xlab="Sample Means", main="Histogram for 30 Replications")
hist(colMeans(replicate(300,rweibull(100,shape=1))),breaks="Scott", xlab="Sample Means", main="Histogram for 300 Replications")
hist(colMeans(replicate(3000,rweibull(100,shape=1))),breaks="Scott", xlab="Sample Means", main="Histogram for 3,000 Replications")
hist(colMeans(replicate(30000,rweibull(100,shape=1))),breaks="Scott", xlab="Sample Means", main="Histogram for 30,000 Replications")
hist(colMeans(replicate(300000,rweibull(100,shape=1))),breaks="Scott", xlab="Sample Means", main="Histogram for 300,000 Replications")
hist(colMeans(replicate(3000000,rweibull(100,shape=1))),breaks="Scott", xlab="Sample Means", main="Histogram for 3,000,000 Replications")

Revealing, a very wonderful .gif :

As the replication size increases, the histogram begins to resemble a normal distribution. Neat!

UPDATE:

A friend writes:

You should emphasize more the process that you are taking the MEAN of sub samples of your 'population' which is Weibull distributed. Then, by creating a vector of these means, one is able to show that these "means" converge to a normal distribution as N approaches infinity. As it is, to the 'less' experienced reader perhaps will fail to realize that you take the means of sub samples of that pop'n, and then it is the means which become normally distributed.

Good point; thanks Keith!

Read more...