Re: map/filter/reduce/lambda opinions and background unscientific mini-survey



Steven D'Aprano wrote:
On Sun, 03 Jul 2005 19:31:02 +0000, Ron Adam wrote:


First on removing reduce:

1. There is no reason why reduce can't be put in a functional module


Don't disagree with that.


or you can write the equivalent yourself. It's not that hard to do, so it isn't that big of a deal to not have it as a built in.


Same goes for sum. Same goes for product, ...

Each item needs to stand on it's own. It's a much stronger argument for removing something because something else fulfills it's need and is easier or faster to use than just saying we need x because we have y.


In this case sum and product fulfill 90% (estimate of course) of reduces use cases. It may actually be as high as 99% for all I know. Or it may be less. Anyone care to try and put a real measurement on it?


which doesn't have that many
common usages apart from calculating the geometric mean, and let's face
it, most developers don't even know what the geometric mean _is_.

I'm neutral on adding product myself.


If you look back at past discussions about sum, you will see that there is
plenty of disagreement about how it should work when given non-numeric
arguments, eg strings, lists, etc. So it isn't so clear what sum should do.

Testing shows sum() to be over twice as fast as either using reduce or a for-loop. I think the disagreements will be sorted out.



2. Reduce calls a function on every item in the list, so it's performance isn't much better than the equivalent code using a for-loop.
That is an optimization issue. Especially when used with the operator
module, reduce and map can be significantly faster than for loops.

I tried it... it made about a 1% improvement in the builtin reduce and an equal improvement in the function that used the for loop.


The inline for loop also performed about the same.

See below..


*** (note, that list.sort() has the same problem. I would support replacing it with a sort that uses an optional 'order-list' as a sort key. I think it's performance could be increased a great deal by removing the function call reference. ***


Second, the addition of sum & product:

1. Sum, and less so Product, are fairly common operations so they have plenty of use case arguments for including them.

Disagree about product, although given that sum is in the language, it doesn't hurt to put product as well for completion and those few usages.

I'm not convinced about product either, but if I were to review my statistics textbooks, I could probably find more uses for it. I suspect that there may be a few common uses for it that are frequent enough to make it worth adding. But it might be better in a module.



2. They don't need to call a pre-defined function between every item, so they can be completely handled internally by C code. They will be much much faster than equivalent code using reduce or a for-loop. This represents a speed increase for every program that totals or subtotals a list, or finds a product of a set.

I don't object to adding sum and product to the language. I don't object to adding zip. I don't object to list comps. Functional, er, functions are a good thing. We should have more of them, not less.

Yes, we should have lots of functions to use, in the library, but not necessarily in builtins.


But removing reduce is just removing
functionality for no other reason, it seems, than spite.

No, not for spite. It's more a matter of increasing the over all performance and usefulness of Python without making it more complicated. In order to add new stuff that is better thought out, some things will need to be removed or else the language will continue to grow and be another visual basic.
Another slippery slope argument.

Do you disagree or agree? Or are you undecided?


Having sum and product built in has a clear advantage in both
performance and potential frequency of use, where as reduce doesn't have
the same performance advantage and most poeple don't use it anyway, so
why have it built in if sum and product are?

Because it is already there.

Hmm.. I know a few folks, Good people, but they keep everything to the point of not being able to find anything because they have so much. They can always think of reasons to keep things, "It's worth something", "it means something to me", "I'm going to fix it", "I'm going to sell it", "I might need it". etc..


"Because it is already there" sound like one of those type of reasons.


Why not just code it as a
function and put it in your own module?

Yes, let's all re-invent the wheel in every module! Why bother having a print statement, when it is so easy to write your own:

def myprint(obj):
    sys.stdout.write(str(obj))

Yes, Guido wants to make print a function in Python 3000. The good thing about this is you can call your function just 'p' and save some typing.


p("hello world")

Actually, I think i/o functions should be grouped in an interface module. That way you choose the interface that best fits your need. It may have a print if it's a console, or it may have a widget if it's a gui.


Best of all, you can customize print to do anything you like, _and_ it is
a function.


    def reduce( f, seq):
        x = 0
        for y in seq:
            x = f(x,y)
        return x


Because that is far less readable, and you take a performance hit.

They come out pretty close as far as I can tell.


def reduce_f( f, seq): x = seq[0] for y in seq[1:]: x = f(x,y) return x

import time

t = time.time()
r2 = reduce(lambda x,y: x*y, range(1,10000))
t2 = time.time()-t
print 'reduce builtin:', t2

t = time.time()
r1 = reduce_f(lambda x,y: x*y, range(1,10000))
t2 = time.time()-t
print 'reduce_f:      ', t2

if r1!=r2: print "results not equal"

>>>
reduce builtin: 0.156000137329
reduce_f:       0.155999898911
>>>
reduce builtin: 0.15700006485
reduce_f:       0.155999898911
>>>
reduce builtin: 0.141000032425
reduce_f:       0.155999898911



But I suspect that most people would just do what I currently do and
write the for-loop to do what they want directly instead of using lambda
in reduce.

That's your choice. I'm not suggesting we remove for loops and force you to use reduce. Or even list comps.

Just don't force me to use decorators! ;-)

Nah, they're ok too, but it did take me a little while to understand their finer points.

Cheers,
Ron
.



Relevant Pages

  • Re: Problem with `big oh estimates in number theory
    ... In my defense, I ... Hmm, in fact, the reason I didn't carry it through ... The sum `I' in your Proof 2 would give another O ... /If/ he had had your Proof 2 in mind, ...
    (sci.math)
  • Re: max(), sum(), next()
    ... The developers choose what they thought would be most useful across the spectrum of programmers and programs after some non-zero amount of debate and discussion. ... In any case, before sum was added as a convenience for summing numbers, *everyone* has to write their own or use reduce. ... Reduce, where S = sum function, raises an exception on empty s. ... part of the reason for adding sum was to eliminate the need to explicitly say 'start my sum at 0' in order to avoid buggy code. ...
    (comp.lang.python)
  • Re: Standard Deviation
    ... But you have no way of reconstructing the SS value from what it would have been from a new mean for the simple reason that SS is defined as the sum of the SQUARED differences from the mean. ...
    (borland.public.delphi.non-technical)
  • Re: infinity
    ... >> sum of all finite natural numbers, and then I could tell you the largest finite ... to repeat forever. ... > Of course the reason you know that x is greater than 0 ... > thinking is hopelessly circular and spiralling inward. ...
    (sci.math)
  • Re: Month to Date information
    ... I would like the revenues to add up 01-31 August. ... You want just one textbox on the form to show the sum of all the revenues for ... textbox with a control source ... DateSerial is a builtin function which takes three numeric arguments, year, ...
    (microsoft.public.access.forms)

Loading