Saturday, January 7, 2023

Calculating Standard Deviation

 Calculating Standard Deviation 

The "classic" formulas aren't always the "easiest to program" formulas. Let's look at the classical formula to calculate a sample standard deviation:

σ = √ (Σ(xᵢ-x̅)²)/(n-1)

where σ is the standard deviation,  is the mean and n is the sample count. But note that we have to walk through the numbers twice: once to calculate the mean, and then again for the rest of the calculations. And yet, old fashioned scientific calculators with just a handful of registers (memory locations) could do this calculation by just entering in a column of numbers. What gives?

The answer is that although conceptually you are summing the square of the difference between each sample and the mean, you can actually do the calculation differently as long as you can keep track of n, the sum of x, and the sum of x². The resulting calculation is:

σ = √(Σ(xᵢ²) - ((Σxᵢ)² / n))/(n-1)

or, slightly more computery: for each new x:

1. n++

2. sumX += x;

3. sumXSquared += x*x;

4. varianceEstimate = (sumXSquared - ((sumX*sumX) / n) / (n-1)

5. stdDevEstimate = SQRT(varianceEstimate)

And you get a running estimate for the standard deviation. You also get the variance, but that's not often really needed.

Statisticians call these the "estimates" because the statistical theory is that there's a magic, universal reality for the actual population variance and standard deviation for which these samples provide an estimate. They aren't wrong :-)

No comments: