Re: Standard Deviation



After walking the dog, I would like to add a comment. The computational formula is dangerous on computers when the mean is large and the variance is small relative to the mean. Let me illustrate this with some simple numbers.

Suppose you have five observations and want to compute the sum of the squared deviations from the mean. The observations are the integers from 1 to 5. The sum of the values is 15 and the sum of the squared values is 55. These values are accumulated in one pass of the values. After the loop you square the simple sum, which is 225, divide this value by N which yields 45, and the SS value is the difference between the two, i.e. 10. So far, so good.

Now suppose you add a constant to all the numbers, and operate with data something like the years from 2001 to 2005. The value for the mean should be the same as the mean for the values above plus 2000, while the SS value should be unchanged. Right?

OK. The sum of all the values is now 10015, while the sum of the squared values is 20060055. Now the square of 10015 is 100300225, which is divided by 5 to yield 20060045, and the SS is 20060055 minus 20060045, which is 10 as before. Note: The square of the sum of the values is in this very simple case with five observations is a 9-digit number (100300225). Therefore with a large number of observations of the same type, there will very easily be a loss of accuracy in the last digits, which is exactly where you need it. Therefore that algorithm is dangerous.

To reiterate: Using the computational formula is dangerous on computers when the mean is large and the variance is small relative to the mean

Conclusion: Either use an algorithm like the one in my other message. As a simple alternative, consider the following. You do not need the exact value for the mean to have reasonable accuracy, you need a resonable estimate. One trick is to use the first observation as a constant to be subtracted from all the values before the accumulation of the simple sum and the sum of the squared values. That will have no other effect on the value for the SS than to increase accuracy. Then add the constant to the average before continuing.



.



Relevant Pages

  • Re: z test how
    ... variance 2 is variance squared ... sum of V2/n = 0.0000145545 ... at the probability that is my probability that this 2 groups are ... It doesn't mean you need to square it. ...
    (sci.stat.edu)
  • Re: Biased and unbiased std dev
    ... unbiased estimate of by dividing by; taking the square root biases ... The reason for the bias is that when you take the average sum of squares ... subtracting out the variance of the sample mean itself (which will be ... deviations around it than any other value. ...
    (sci.stat.math)
  • Re: VHDL code for finding standard deviation for a chunk of numbers
    ... The variance is the mean square minus the ... Define two numbers: sum, ... forced zero mean you can dispense with the calculation of the mean and just ...
    (comp.arch.fpga)
  • Re: z test how
    ... Bruce Weaver wrote: ... variance 2 is variance squared ... sum of V2/n = 0.0000145545 ... It doesn't mean you need to square it. ...
    (sci.stat.edu)
  • every number has its own significance.....
    ... 18 is the only number that is twice the sum of its digits. ... 21 is the smallest number of distinct squares needed to tile a square. ... 26 is the only number to be directly between a square and a cube. ... 27 is the largest number that is the sum of the digits of its cube. ...
    (sci.crypt)