Re: Standard Deviation
- From: Tom Backer Johnsen <backer@xxxxxxxxxxxx>
- Date: Wed, 12 Oct 2005 09:23:59 +0200
After walking the dog, I would like to add a comment. The computational formula is dangerous on computers when the mean is large and the variance is small relative to the mean. Let me illustrate this with some simple numbers.
Suppose you have five observations and want to compute the sum of the squared deviations from the mean. The observations are the integers from 1 to 5. The sum of the values is 15 and the sum of the squared values is 55. These values are accumulated in one pass of the values. After the loop you square the simple sum, which is 225, divide this value by N which yields 45, and the SS value is the difference between the two, i.e. 10. So far, so good.
Now suppose you add a constant to all the numbers, and operate with data something like the years from 2001 to 2005. The value for the mean should be the same as the mean for the values above plus 2000, while the SS value should be unchanged. Right?
OK. The sum of all the values is now 10015, while the sum of the squared values is 20060055. Now the square of 10015 is 100300225, which is divided by 5 to yield 20060045, and the SS is 20060055 minus 20060045, which is 10 as before. Note: The square of the sum of the values is in this very simple case with five observations is a 9-digit number (100300225). Therefore with a large number of observations of the same type, there will very easily be a loss of accuracy in the last digits, which is exactly where you need it. Therefore that algorithm is dangerous.
To reiterate: Using the computational formula is dangerous on computers when the mean is large and the variance is small relative to the mean
Conclusion: Either use an algorithm like the one in my other message. As a simple alternative, consider the following. You do not need the exact value for the mean to have reasonable accuracy, you need a resonable estimate. One trick is to use the first observation as a constant to be subtracted from all the values before the accumulation of the simple sum and the sum of the squared values. That will have no other effect on the value for the SS than to increase accuracy. Then add the constant to the average before continuing.
.
- References:
- Standard Deviation
- From: Ed Dressel
- Re: Standard Deviation
- From: Nils Haeck
- Re: Standard Deviation
- From: TeamB
- Re: Standard Deviation
- From: Tom Backer Johnsen
- Standard Deviation
- Prev by Date: Re: Have they left?
- Next by Date: Documenting C# code in D2005
- Previous by thread: Re: Standard Deviation
- Next by thread: Re: Standard Deviation
- Index(es):
Relevant Pages
|