Re: Feature suggestion: sum() ought to use a compensated summation algorithm



Szabolcs Horvát <szhorvat@xxxxxxxxx> wrote:
I did the following calculation: Generated a list of a million random
numbers between 0 and 1, constructed a new list by subtracting the mean
value from each number, and then calculated the mean again.

The result should be 0, but of course it will differ from 0 slightly
because of rounding errors.

However, I noticed that the simple Python program below gives a result
of ~ 10^-14, while an equivalent Mathematica program (also using double
precision) gives a result of ~ 10^-17, i.e. three orders of magnitude
more precise.

Here's the program (pardon my style, I'm a newbie/occasional user):

from random import random

data = [random() for x in xrange(1000000)]

mean = sum(data)/len(data)
print sum(x - mean for x in data)/len(data)

A little research shows that Mathematica uses a "compensated summation"
algorithm. Indeed, using the algorithm described at
http://en.wikipedia.org/wiki/Kahan_summation_algorithm
gives us a result around ~ 10^-17:

I was taught in my numerical methods course (about 20 years ago now!)
that the way to do this sum with most accuracy is to sum from the
smallest magnitude to the largest magnitude.

Eg

>>> from random import random
>>> data = [random() for x in xrange(1000000)]
>>> mean = sum(data)/len(data)
>>> print sum(x - mean for x in data)/len(data)
1.64152134108e-14
>>> mean = sum(sorted(data))/len(data)
>>> print sum(x - mean for x in data)/len(data)
5.86725534824e-15
>>>

It isn't as good as the compensated sum but it is easy!

I thought that it would be very nice if the built-in sum() function used
this algorithm by default. Has this been brought up before? Would this
have any disadvantages (apart from a slight performance impact, but
Python is a high-level language anyway ...)?

sum() gets used for any numerical types not just floats...

--
Nick Craig-Wood <nick@xxxxxxxxxxxxxx> -- http://www.craig-wood.com/nick
.



Relevant Pages

  • Re: Is there a polynomial algorithm for that problem?
    ... The sum over all n numbers is negative. ... Unfortunately that algorithm does not scale. ... An instance of 3-Partition consists of a list of 3m positive integers ... list can be partitioned into m lists of size 3, ...
    (sci.math)
  • Re: Choose k random lines from file
    ... > What, for the algorithm shown above, or for any algorithm? ... Yes, that is an unbiased generator, but I had thought ... >>give less of a bias using an extra float or double ... >>distribution for the sum. ...
    (comp.programming)
  • Re: standard deviation, but without the mean
    ... Suppose Xsum is the sum of the the first n numbes and that X2 is ... Just keep track if X2 and Xsum as you go. ... Actually this is not a correct algorithm, ... the error in the least 9 digits.. ...
    (sci.stat.math)
  • Re: C# generic containers from a "C++ perspective"
    ... If you implement the floating-point and integer algorithms ... Maintain the sum of all the items in an accumulator. ... That is the fundamentally-broken algorithm that I was expecting you to ...
    (microsoft.public.dotnet.languages.csharp)
  • Re: Disjoint circle merge NP complete for L^n error?
    ... The key point is that we need to be sure there is no optimal algorithm ... We only need to erase MAX and replace with SUM in the proof ... minimum error partition) algorithm in P. ... MAX. I'm pretty sure that if both these are in NP then all L^n metrics ...
    (comp.theory)