Re: Standard Deviation
- From: Tom Backer Johnsen <backer@xxxxxxxxxxxx>
- Date: Wed, 12 Oct 2005 08:13:15 +0200
Nils Haeck wrote:
squared values. And, even on modern machines, you have a potential bomb
in your program. Think about that, shut up, do not mention anything to anyone, and rewrite your programs.
No I won't shut up. The formula I presented is perfectly valid and is indeed a solution to the OP's problem to which you suggested there's no solution at all.
Yes, the formula is valid -- theoretically. Perfectly valid computational formula.
In other words, do not even THINK about using a formula like that on a serious computer program with statistical computations.
The formula itself is OK, but it assumes unlimited accuracy. The implementation itself must deal with details like how to represent the numbers. Depending on the input data, this might be int64, extended, or perhaps even a large integer or large float class.
That is exactly the point. Unless you are 100% sure about the data that will use the algorithm, you should not use the computational formula as the basis. If you look at the formula, you will see that the value for the SS is computed as a relatively small difference between two numbers that can become very large if the number of values thrown at the algorithm is very large. So, you are using the least accurate part of the value of the SS. Therefore you may, in the extreme case, end up with a negative sum of squared numbers.
A better algorithm is the following:
Avr := 0.0;
Std := 0.0;
Val := 0.0;
for i:=1 to then number of observations do begin
d := sObs - Avr;
Val := Val + 1.0;
Avr := Avr + (d / Val);
Std := Std + (d * (sObs - Avr));
end;This is a one-pass thing which most want to use, and is quite accurate, even with single precision. After the loop the average needs no further adjustment, while Std contains the sum of squares and needs to be adjusted if you want the standard deviation.
I use these formulas myself for computation of statistical data in image analysis, where input data consists of int values in the range 0..255 (8bit) or 0..4095 (12bit), and in these cases for any sane bitmap sizes the intermediate results (SumSqrX and SumX) always stay well within the int64 range (SumSqrX uses at max 24 bits, leaving 64-24 = 40 bits for N, thus a bitmap of 2^20 by 2^20).
There's a good reason to use these formulas in this case: you can calculate average and standard deviation in one pass, and calculation of intermediate results are integer operations. Using the SS formula implies you must know the mean first, so it's a two-pass process. Furthermore, you cannot avoid working with floats.
Well, as I said, unless you are VERY sure about the data that will be thrown at the algoritm (which you seem to be), do not use it, ever. It is not recommended for general use. However, I see the point about avoiding float operations though.
Tom .
- Follow-Ups:
- Re: Standard Deviation
- From: Jan Derk
- Re: Standard Deviation
- References:
- Standard Deviation
- From: Ed Dressel
- Re: Standard Deviation
- From: Nils Haeck
- Re: Standard Deviation
- From: TeamB
- Re: Standard Deviation
- From: Tom Backer Johnsen
- Standard Deviation
- Prev by Date: Re: Delphi 2006 advertized as Delphi 2005
- Next by Date: Re: Quick Poll: Delphi and Win64
- Previous by thread: Re: Standard Deviation
- Next by thread: Re: Standard Deviation
- Index(es):
Relevant Pages
|