Refreshing Pelican

Posted by Mischa Fisher in Technology   
Tue 03 July 2018

Two years after rebuilding this site, Pelican - the python based static site generator I use - was in need of an update. Here's a brief summary of a few meaningful changes to the build:

(and kudos to my good friend and former Google/FB engineer Sahand Saba for leading the way on all of these changes)

Updating Math Rendering

The original version of this site used Mathjax to render LaTeX within Markdown, which worked well for displaying equations, but there was room for speed and bandwidth improvements. Enter Katex, "the fastest math typesetting library for the web".

It has a simple installation, no dependencies, and it's own CDN. The trick is removing the need to have in-browser javascript doing the rendering.

The github project file is here: https://github.com/Khan/KaTeX

Installation is as simple as a pelican_katex.py file referenced in the pelicanconf.py file:

```` PLUGINS = ['pelican_katex'] ````

And the repo here: Sahand Saba Katex Pelican repo

Configuring HTTPS for Amazon S3

Looking up at the URL bar, you'll notice everything is now being served in https. This is thanks to a link between Amazon Cloudfront and the S3 bucket that hosts this site.

Quick links to get started are here:

Versioning Using Bitbucket

Version control is a critical but often overlooked step not only embarrassingly frequently on the enterprise level, but also for many personal projects. Rather than running iterations of the site through copied folders hosted on my personal Dropbox account, I finally got around to version control using bitbucket. Git is of course more standard, but bitbucket provides free private repositories, which is a necessity for the backend python code being used to generate the site.

Favicon

Finally, there's now a favicon of the camera logo of the old soviet era camera that constitutes my personal site logo. There's a simple generator package available on github that can generate a favicon stack to work on just about any browser or platform.

Read more...


Distributional Parameters

Posted by Mischa Fisher in Econometrics   
Wed 01 March 2017

Part of my thesis involved modeling survival times against parametric distributions, such as the Weibull, log-logistic, and exponential distributions.

One of the fun aspects of distribution theory is seeing how different parameter specifications can make some distributions special forms of other distributions. For today's quick chart, a lead in to the subject by looking at how a couple of commonly used survival analysis parameters resemble one another (with the parameter specifications highlighted in the R code below).

x_lower <- 0
x_upper <- 10
max_height2 <- max(dexp(x_lower:x_upper, rate = 1, log = FALSE), 
       dweibull(x_lower:x_upper, shape = 1, log = FALSE),
       dlogis(x_lower:x_upper, scale = 1, log = FALSE))
ggplot(data.frame(x = c(x_lower, x_upper)), aes(x = x)) + xlim(x_lower, x_upper) + 
 ylim(0, max_height2) +
 stat_function(fun = dexp, args = list(rate = 2), aes(colour = "Exponential")) + 
 stat_function(fun = dweibull, args = list(shape = 2), aes(colour = "Weibull")) + 
 stat_function(fun = dlogis, args = list(scale = 2), aes(colour = "Logistic")) + 
 scale_color_manual("Distribution", values = c("blue", "green", "red")) +
labs(x = "\n x", y = "f(x) \n", 
title = "Common Survival Analysis Distribution Density Plots \n") + 
theme(plot.title = element_text(hjust = 0.5), 
axis.title.x = element_text(face="bold", colour="blue", size = 12),
axis.title.y = element_text(face="bold", colour="blue", size = 12),
legend.title = element_text(face="bold", size = 10),
legend.position = "top") + theme_economist()

Read more...



A Brief Exercise Illustrating the Central Limit Theorem

Posted by Mischa Fisher in Econometrics   
Thu 26 November 2015

Succinctly, the Central Limit Theorem can be expressed as:

In probability theory, the central limit theorem (CLT) states that, given certain conditions, the arithmetic mean of a sufficiently large number of iterates of independent random variables, each with a well-defined expected value and well-defined variance, will be approximately normally distributed, regardless of the underlying distribution.

If you're an econometrics student, it's likely the first time you're exposed to the CLT is when discussing the OLS estimator in the context of testing hypothesis about the true parameters in order to form confidence intervals: when relying on asymptotics, the normality of the error distributions is not required because as long as the normal 5 Gauss-Markov assumptions are satisfied, the distribution of the OLS estimator will converge to a normal distribution as n goes to infinity.

The CLT is pretty neat and is given short shrift in the context of econometrics, so, here's a brief experiment one can perform in R to illustrate what happens as the theorem comes into effect.

Starting with the Weibull distribution:

plot(sort(rweibull(10000, shape=1)), main="The Weibull Distribution")

We then sample the means of the distribution in increasing replications, then draw histograms from the distributions.

hist(colMeans(replicate(30,rweibull(100,shape=1))),breaks="Scott", xlab="Sample Means", main="Histogram for 30 Replications")
hist(colMeans(replicate(300,rweibull(100,shape=1))),breaks="Scott", xlab="Sample Means", main="Histogram for 300 Replications")
hist(colMeans(replicate(3000,rweibull(100,shape=1))),breaks="Scott", xlab="Sample Means", main="Histogram for 3,000 Replications")
hist(colMeans(replicate(30000,rweibull(100,shape=1))),breaks="Scott", xlab="Sample Means", main="Histogram for 30,000 Replications")
hist(colMeans(replicate(300000,rweibull(100,shape=1))),breaks="Scott", xlab="Sample Means", main="Histogram for 300,000 Replications")
hist(colMeans(replicate(3000000,rweibull(100,shape=1))),breaks="Scott", xlab="Sample Means", main="Histogram for 3,000,000 Replications")

Revealing, a very wonderful .gif :

As the replication size increases, the histogram begins to resemble a normal distribution. Neat!

UPDATE:

A friend writes:

You should emphasize more the process that you are taking the MEAN of sub samples of your 'population' which is Weibull distributed. Then, by creating a vector of these means, one is able to show that these "means" converge to a normal distribution as N approaches infinity. As it is, to the 'less' experienced reader perhaps will fail to realize that you take the means of sub samples of that pop'n, and then it is the means which become normally distributed.

Good point; thanks Keith!

Read more...



An Intuitive Explanation of the OLS Estimator for both Traditional and Matrix Algebra

Posted by Mischa Fisher in Econometrics   
Mon 19 October 2015

The Ordinary Least Squares estimator, β^ \hat{\beta} is the first thing one learns in econometrics. It has two forms, one in standard algebra and one in matrix algebra, but it's important to remember the two are equivalent:

β^=cov^(x,y)var(x)=(XX)1XY \hat{\beta} = \frac{\hat{cov}(x,y)}{var(x)} = \mathbf{({X}'X)^{-1}{X}'Y}

I think most students will find it extremely easy to get lost in notation and miss the link to be made with real world data. The following exercise is a helpful way I found to make sure one continues to make the link between traditional 'simple' notation, Matrix Algebra notation, and the underlying data and arithmetic that goes into the ordinary linear regression estimator.

Deriving the Algebraic Notation for the Simple Bivariate Model

The familiar simple bivariate model is expressed as an independent observation as a function of an intercept, a regression coefficient, and an error term (respectively):

yi=b0+b1x1+ei y_{i} = b_{0} + b_{1}x_{1} + e_{i}

Where we wish to minimize the sum of squared errors (SSE):

minimize:SSE=i=1Nei2 minimize: SSE = \sum_{i=1}^{N} e_{i}^{2}

To do so we isolate the error of the regression to make it a function of the other terms:

ei=yib0b1x1 e_{i} = y_{i} - b_{0} - b_{1}x_{1}

Then substitute:

minimize:i=1N(yib0b1x1)2 minimize: \sum_{i=1}^{N} (y_{i} - b_{0} - b_{1}x_{1})^{2}

For our purposes, we'll ignore the derivation of the intercept and take it as a given that it is yˉβ1^xˉ \bar{y} - \hat{\beta_{1}}\bar{x} and just solve for the β^ \hat{\beta} slope coefficient. To minimize the errors, we need to take the partial derivative with respect to b1 b_{1}

SSEb1=b1[i=1N(yib0b1x1)2] \frac{\partial SSE }{\partial b_{1}} = \frac{\partial }{\partial b_{1}} \left [ \sum_{i=1}^{N} (y_{i} - b_{0} - b_{1}x_{1})^{2} \right ]

Move the summation operator through since the the derivative of a sum is equal to the sum of the derivatives:

SSEb1=i=1N[b1(yib0b1x1)2] \frac{\partial SSE }{\partial b_{1}} = \sum_{i=1}^{N} \left [ \frac{\partial }{\partial b_{1}} (y_{i} - b_{0} - b_{1}x_{1})^{2} \right ]

Take the derivative (using the chain rule), then setting it equal to 0 for the first order condition to find the min/max:

SSEb1=2i=1Nxi(yib0b1x1)=0 \frac{\partial SSE }{\partial b_{1}} = -2 \sum_{i=1}^{N} x_{i}(y_{i} - b_{0} - b_{1}x_{1}) = 0

Then multiply by 12 - \frac{1}{2} to simplify:

0=i=1Nxi(yib0b1x1) 0 = \sum_{i=1}^{N} x_{i}(y_{i} - b_{0} - b_{1}x_{1})

Substitute the solution for the intercept, b0 b_{0} , that we took as a given above:

0=i=1Nxi(yi(yˉβ1^xˉ)b1x1) 0 = \sum_{i=1}^{N} x_{i}(y_{i} - (\bar{y} - \hat{\beta_{1}}\bar{x} ) - b_{1}x_{1})

Then rearrange and distribute the summation operator to solve for β1^ \hat{\beta_{1}} :

β1^=i=1N(yiyˉ)xii=1N(xixˉ)xi \hat{\beta_{1}} = \frac{\sum_{i=1}^{N} (y_{i} - \bar{y} )x_{i}}{ \sum_{i=1}^{N} (x_{i} - \bar{x})x_{i} }

Which is algebraically equivalent to:

$$ \frac{\hat{cov}(x,y)}{var(x …

Read more...


Customizing Pelican for Static and Dynamic Content

Posted by Mischa Fisher in Technology   
Sun 13 September 2015

This is a a brief summary of a few changes I think were extremely helpful in tweaking the stock Pelican build for a customized, statically generated website:

Rendering Math Markup

Mathjax seemed to be the simplest solution. Embedding the information via a short snippet of code linking to their CDN, and then using simple LaTeX notation within Markdown takes only a minute or two:

&lt;script         src='https://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML'&gt;&lt;/script&gt;

Using Bootstrap for Responsive CSS

Erik Flowers Bootstrap grid introduction is a very nice resource (Click Here)

Customizing Your Own Theme

Web developer Robert Iwancz made a great bare bones framework on which one can build out almost anything. (Click Here for his VoidyNullness Theme)

Using Pelican for a Static Landing Page

Find your .md markdown file that you want to use as your home page content and add the following metadata to the top of the file: (the second change is to prevent your home page from appearing in the menu twice, depending on your theme).

save_as: index.html
status: hidden

Then in a separate .md file that will become your blog content, add the metadata identifier at the top to assign the appropriate HTML template from your templates folder:

Title: (Your Blog Title)
Date: (The Date)
Category: Page
Template: (the title of your blog template, without the .html extension

Finally, copy the content generating loops from the default page templates into your new blog html template:

{% block content_body %}
{% block article %}
{% if articles %}
  {% for article in (articles_page.object_list if articles_page else articles) %}
&lt;article&gt;
  {% for file in CUSTOM_INDEX_ARTICLE_HEADERS %}
    {% include "includes/" + file %}
  {% else %}
    {% include "includes/article_header.html" %}
  {% endfor %}

  {% if ARTICLE_FULL_FIRST is defined and loop.first and not articles_page.has_previous() %}
    &lt;div class="content-body"&gt;
    {% if article.standfirst %}
      &lt;p class="standfirst"&gt;{{ article.standfirst|e }}&lt;/p&gt;
    {% endif %}
    {{ article.content }}
    {% include "includes/comments.html" %}
  &lt;/div&gt;
  {% else %}
    {% include "includes/index_summary.html" %}
  {% endif %}
&lt;/article&gt;

&lt;hr /&gt;
  {% endfor %}
{% endif %}
{% endblock article %}

{% block pagination %}
&lt;nav class="index-pager"&gt;
{% if articles_page and articles_paginator.num_pages &gt; 1 %}
    &lt;ul class="pagination"&gt;
    {% if articles_page.has_previous() %}
        &lt;li class="prev"&gt;
          &lt;a href="{{ SITEURL }}/{{ articles_previous_page.url }}"&gt;
        &lt;i class="fa fa-chevron-circle-left fa-fw fa-lg"&gt;&lt;/i&gt; Previous
      &lt;/a&gt;
    &lt;/li&gt;
{% else %}
    &lt;li class="prev disabled"&gt;&lt;span&gt;
        &lt;i class="fa fa-chevron-circle-left fa-fw fa-lg"&gt;&lt;/i&gt; 
        Previous&lt;/span&gt;
    &lt;/li&gt;
{% endif %}

{% for num in articles_paginator.page_range %}
  {% if num == articles_page.number %}
    &lt;li class="active"&gt; &lt;span&gt;{{ num }}&lt;/span&gt; &lt;/li&gt;
  {% else %}
    &lt;li&gt;
      &lt;a href="{{ SITEURL }}/{{ articles_paginator.page(num).url }}"&gt;{{ num }}&lt;/a&gt;
    &lt;/li&gt;
  {% endif %}
{% endfor %}

{% if articles_page.has_next() %}
    &lt;li class="next"&gt;
      &lt;a href="{{ SITEURL }}/{{ articles_next_page.url }}"&gt;
        Next &lt;i class="fa fa-chevron-circle-right fa-fw fa-lg"&gt;&lt;/i&gt;
      &lt;/a&gt;
    &lt;/li&gt;
{% else %}
    &lt;li class …

Read more...


The Pros and Cons of Using a Static Site Generator

Posted by Mischa Fisher in Technology   
Sat 12 September 2015

In a recent effort to re-gear this site toward the quantitative and away from the strictly artistic, I rebuilt the site from scratch with one singular aim: make posting effortless. In that spirit, I was directed by a good friend toward static site generators; a new development in the 8 or so years since I had last looked at any of the technologies surrounding web development.

With the new site up and running, here is a brief list of the pros and cons, as I see them, of using a static site generator:

Pros:

  1. Effortless posting:

Write posts in Markdown, then upload with a few keystrokes straight from the terminal.

  1. Cheap and scalable hosting:

I'm using Amazon's S3 for hosting, and Route 53 for DNS services. They're almost free in low traffic, and infinitely scalable in high traffic.

  1. No backend to maintain:

PHP, SQL, and the slow load times and unresponsiveness of shared servers on most hosting plans are a thing of the past. (I'm looking at you GoDaddy.com)

  1. Easy to backup or migrate:

I have all the website files in the location of my laptop being backedup to Dropbox in realtime. In addition, version control through something like Git is also handy, particularly when messing around with the underlying Python scripts that generate the site.

Cons:

  1. Steep learning curve:

The list of technologies one has to look at include: HTML, CSS, Python, JavaScript, Markdown, the unix terminal, Pelican, FontAwesome, Jinja, Bootstrap, s3cmd, pip, brew, and virtual environments.

  1. Longer to get setup:

With a squarespace account you can be up and running in minutes. And it will look a lot prettier by default.

For me those were the biggest pros and cons I weighed in setting this site up. (Your mileage may vary.)

Read more...