Re: strange sum behavior

From: Josiah Carlson (jcarlson_at_nospam.uci.edu)
Date: 02/22/04


Date: Sun, 22 Feb 2004 11:09:03 -0800

simon place wrote:

> while playing about with inheriting from list to be able to cache macro
> properties i noticed this, the rate of summing a list seems to be over
> linear?
>
> its nearly 3 times faster to sum the sums of smaller lists?
>
>
> from time import clock
>
> l=range(0,1000000)
>
> start=clock()
> print sum(l),
> print clock()-start
>
> # now sum a list of the sums of 1000 slices.
> start=clock()
> print sum([sum(l[x:x+1000]) for x in xrange(0,len(l),1000)]),
> print clock()-start
>
> # repeat
> start=clock()
> print sum(l),
> print clock()-start
>
> # output from 500MHz AMD K62 64Mb Python 2.3.3 (#51, Dec 18 2003,
> 20:22:39) [MSC v.1200 32 bit (Intel)] on win32
> #499999500000 1.89348044721
> #499999500000 0.731985115406
> #499999500000 1.90818149818

You are confusing the behavior of summing long integers with that of
summing integers.

When you sum from 0 to 65,534, your sum is just under sys.maxint on (32
bit machines). When you sum from 0 to 65,535, your sum is just over
sys.maxint (on 32 bit machines), and becomes a Python Long. It is no
longer a single add instruction when running on the core processor when
adding the running total with the next number, it is multiple
instructions. If you check the implementation, Python ends up doing
manual adds and carries on unsigned C shorts, which makes up each
'digit' in a Python long.

When you break the pieces up into blocks of 1000, the largest sum is for
998,999 to 999,999, which is 999,499,500; which you will notice is
smaller than sys.maxint (on 32 bit machines), and at worst only results
in 1000 Python long adds, as compared with 934464 long additions in your
original method (that factor of 934 times as many long additions is crazy).

The fact that it is /only/ 3 times slower shows us that Python's long
integer implementation for smaller long integers works pretty damn well.

  - Josiah



Relevant Pages

  • Re: sum for sequences?
    ... Python is under no compulsion to make "the obvious way" obvious to anyone ... Joining lists? ... Making sum more efficient is just a band aid. ... I think the very existence of math.fsumalready violates "there ...
    (comp.lang.python)
  • Re: sum for sequences?
    ... Since the + operator takes both numbers and lists, ... Python makes no promises about performance. ... It would be nice if the docs for sum() at least pointed to ... It only takes a handful of sublists, about ten on my box, to expose the ...
    (comp.lang.python)
  • Re: sum for sequences?
    ... This is why many of us program in Python. ... who would never use sum() on lists, EVEN IF IT WERE FIXED TO NOT BE SO ... which *does* cause real problems in real code, ...
    (comp.lang.python)
  • Re: sum for sequences?
    ... like it should work on lists. ... Returns the sum of a sequence of numbers (NOT strings) plus the ... What part of that suggested to you that sum might not be polymorphic? ... Before Python2.6, which introduced a numeric tower, Python *couldn't* ...
    (comp.lang.python)
  • Re: Feature suggestion: sum() ought to use a compensated summation algorithm
    ... In that thread I suggested that Python ought to implement sum by adding ... I never did get round to re-coding my sumpairs() function in C, ... would be faster than the existing sum in some cases such as for lists). ...
    (comp.lang.python)