Re: Microbenchmark: Summing over array of doubles

From: Yaroslav Bulatov (bulatov_at_engr.orst.edu)
Date: 08/03/04


Date: 2 Aug 2004 16:32:21 -0700

Duncan Booth <me@privacy.net> wrote in message news:<Xns95395FD0EF85duncanrcpcouk@127.0.0.1>...
> Christopher T King <squirrel@WPI.EDU> wrote in
> news:Pine.LNX.4.44.0408011840050.21160-100000@ccc4.wpi.edu:
>
> > On 1 Aug 2004, Duncan Booth wrote:
> >
> >> I just had a brief look at the code you posted. Are you not concerned
> >> about accuracy in any of your calculations? Summing a 10 million
> >> element array by simply accumulating each element into a running
> >> total is guaranteed to give a lousy result.
> >
> > Lousy or not, I believe that's how numarray is implemented internally,
> > so at least all the benchmarks are the same. If you want accuracy
> > summing that many numbers, you're going to have to do it in software
> > (likely by summing each mantissa bit individually and reconstructing
> > the float afterward), so it will be abysmally slow (obviously not what
> > the OP wants).
> >
>
> My point being that speed isn't everything. Most applications doing large
> floating point calculations should be concerned about accuracy, and how not
> to add up a large array of floating point numbers was one of the first
> things I was taught in computer science classes. The fastest way to sum 10
> million numbers (albeit at some considerable loss of accuracy):
>
> return 0
>
>
> The 'correct' way to sum a large array of floats is, of course, to
> calculate a lot of partial sums and sum them. For example, naively, we
> might say that to sum the array we sum the first half, sum the second half
> and add the results together. This definition being recursive ensures that
> if the inputs are all of a similar size then all the intermediate
> calculations involve summing similarly sized values. A more sophisticated
> technique might also try to handle the case where not all values are a
> similar size.
>
> If you are worried about speed, then calculating partial sums is also the
> way to go: the naive technique used by the original poster leaves no scope
> for parallel calculation. It should be possible to write a slightly more
> complex loop in the C version that runs a lot faster on systems that are
> capable of performing multiple floating point instructions in parallel.

You are right, naive summing generates significant accuracy losses.
I estimated the error by summing [0.123456789012345]*1000000 and
comparing it to
1234567.89012345. All methods have error about 1e-4 .

The method below sums the array at the same speed as regular Python
sum loop, but reports error < 1e-15 .

def sum2(arr):
  size = len(arr)
  for itr in range(int(ceil(log(size)/log(2)))):
    for i in xrange(0, size, 2**(itr+1)):
      next_i = i+2**itr
      if next_i<size:
        arr[i]+=arr[next_i]
  return arr[0]

When arbitrary precision numbers are being used, this method has
performance O(nd) vs. regular summing O(n(d+log d)) where n is size of
array, d is number of digits of the elements. In practice I got about
20% performance increase when summing an array of 1000 Decimals using
method above.

A better estimate of error difference would use random digits as
opposed to [x]*10000000, but I don't know how I would calculate exact
answer in any reasonable amount of time. (using Decimals takes over a
second for 4000 array (with conversion))

Yaroslav



Relevant Pages

  • Re: Microbenchmark: Summing over array of doubles
    ... > summing that many numbers, you're going to have to do it in software ... floating point calculations should be concerned about accuracy, ... to add up a large array of floating point numbers was one of the first ... The 'correct' way to sum a large array of floats is, of course, to ...
    (comp.lang.python)
  • Re: Subset Sum problem (w/ limited scope)
    ... The "Subset Sum" problem (stated various ways depending on what you ... a subset of those integers that sums up to zero. ... Are there going to be duplicate values in your original array? ...
    (comp.lang.fortran)
  • Re: How can I find the greatest possible sum within 12 months? A newbie...
    ... It may be because you have headers, and when I tied it first I omitted ... You may have omitted to enter it as an array formula (see notes on ... the starting sum is in A2 and the end sum is in A123. ... Excel will insert them for you. ...
    (microsoft.public.excel.worksheet.functions)
  • Re: Explanation of SUMPRODUCT
    ... I don't even think you need to enter it as an array ... Yes, we know this, but in this case SUM is an array formula SUMPRODUCT ... > each element is the number of times the corresponding cell value appears ...
    (microsoft.public.excel.programming)
  • Re: Simple but confusing algorith question
    ... As you mention that this is an interview question ... ... through the array, then query against the structure somehow. ... structure and the cost of querying the data structure. ... array 1,2,3,4 sum = 5 ...
    (microsoft.public.dotnet.languages.csharp)