For an estimator to be good we usually look for it to be unbiased. This means the estimator is expected to be the population value (i.e. ). Why is the naïve estimator of the population variance biased? If how could we possibly do better?
While it is true that or equivalently it is not true that . This is by definition . Therefore every squared deviation from the sample mean underestimates the squared deviation from the population mean by . Taking this into account we arrive at the unbiased estimator .
Full details of this proof are available on Wikipedia.
Degrees of freedom
Another less concrete perspective on Bessel’s correction employs the notion of degrees of freedom. I dislike this explanation because it’s quite hand-wavy, but I will provide it for completeness.
Imagine a sample of data with a missing value: . Let’s say we know the mean of this sample . Can we figure out what the missing data point is? Of course, it’s simple algebra.
When we say a value is free to vary we’re saying that given certain constraints the value is not determined. In this case is determined by the system. Given the mean we know exactly what the missing data point has to be. The variable is not free to vary, which means this system has no degrees of freedom.
What about two missing data points: with ? We know that without an additional equation relating to we cannot find out an exact value for either variable. But how many degrees of freedom are there? Well if we set to a value then cannot vary. So we say that this system has one degree of freedom because we’re only free to vary one variable.
How many degrees of freedom are there in the estimator ? Ostensibly there are , but this is not true. We can derive a constraint.
Given values for the last variable is determined. This means there are only measurements of deviation because only these measurements are free to vary. Once again we arrive at Bessel’s correction.