Re: Pre-PEP: Dictionary accumulator methods

From: Francis Girard (francis.girard_at_free.fr)
Date: 03/20/05


To: python-list@python.org, Raymond Hettinger <python@rcn.com>
Date: Sun, 20 Mar 2005 20:49:13 +0100

Hi,

I really do not like it. So -1 for me. Your two methods are very specialized
whereas the dict type is very generic. Usually, when I see something like
this in code, I can smell it's a patch to overcome some shortcomings on a
previous design, thereby making the economy of re-designing. Simply said,
that's bad programming.

After that patch to provide a solution for only two of the more common
use-cases, you are nonetheless stucked with the old solution for all the
other use-cases (what if the value type is another dictionary or some
user-made class ?).

Here's an alternate solution which I think answers all of the problems you
mentionned while being generic.

=== BEGIN SNAP

def update_or_another_great_name(self, key, createFunc, updtFunc):
  try:
    self[key] <<<= updtFunc(self[key])
      ## This is "slow" with Python "=" since the key has to be searched
      ## twice But the new built-in method just has to update the value the
      ## first time the key is found. Therefore speed should be ok.
    return True
  except KeyError:
    self[key] = createFunc()
    return false

## Now your two specialized methods can be easily written as :

## A built-in should be provided for this (if not already proposed) :
def identical(val):
  return val

def count(self, key, qty=1):
  self.update_or_another_great_name(key, identical,
                                    partial(operator.add, qty))
  ## partial is coming from : http://www.python.org/peps/pep-0309.html
  ## Using only built-in function (assuming "identical") as arguments makes it
  ## ok for speed (I guess).
  
def appendlist(self, key, *values):
  self.update_or_another_great_name(key,
                                    partial(list, values),
                                    partial(ListType.extend, X = values))
  ## The first "partial" usage here is an abuse just to make sure that the
  ## list is not actually constructed before needed. It should work.
  ## The second usage is more uncertain as we need to bind the arguments from
  ## the right. Therefore I have to use the name of the parameter and I am not
  ## sure if there's one. As this list is very prolific, someone might have an
  ## idea on how to improve this.
  
=== END SNAP

By using only built-in constructs, this should be fast enough. Otherwise,
optimizing these built-ins is a much more clean and sane way of thinking then
messing the API with ad-hoc propositions.

Reviewing the problems you mention :

> The readability issues with the existing constructs are:
>
> * They are awkward to teach, create, read, and review.

The method update_or_another_great_name is easy to understand, I think. But it
might not always be easy to use it efficiently with built-ins. But this is
always the case. "Recipees" can be added to show how to efficiently use the
method.

> * Their wording tends to hide the real meaning (accumulation).

Solved.

> * The meaning of setdefault() 's method name is not self-evident.

Solved.

>
> The performance issues with the existing constructs are:
>
> * They translate into many opcodes which slows them considerably.

I really don't know what will be the outcome of the solution I propose. I
certainly do not know anything about how my Python code translates into
opcodes.

> * The get() idiom requires two dictionary lookups of the same key.

Solved

> * The setdefault() idiom instantiates a new, empty list prior to every

Solved

> call. * That new list is often not needed and is immediately discarded.

Solved

> * The setdefault() idiom requires an attribute lookup for extend/append.

Solved

> * The setdefault() idiom makes two function calls.

Solved

And perhaps, what you say here is also true for your two special use-cases :

> For other
> uses, plain Python code suffices in terms of speed, clarity, and avoiding
> unnecessary instantiation of empty containers:
>
> if key not in d:
> d.key = {subkey:value}
> else:
> d[key][subkey] = value

Much better than adding special cases on a generic class. Special cases always
demultiply and if we open the door ....

Regards,

Francis Girard

Le samedi 19 Mars 2005 02:24, Raymond Hettinger a écrit :
> I would like to get everyone's thoughts on two new dictionary methods:
>
> def count(self, value, qty=1):
> try:
> self[key] += qty
> except KeyError:
> self[key] = qty
>
> def appendlist(self, key, *values):
> try:
> self[key].extend(values)
> except KeyError:
> self[key] = list(values)
>
> The rationale is to replace the awkward and slow existing idioms for
> dictionary based accumulation:
>
> d[key] = d.get(key, 0) + qty
> d.setdefault(key, []).extend(values)
>
> In simplest form, those two statements would now be coded more readably as:
>
> d.count(key)
> d.appendlist(key, value)
>
> In their multi-value forms, they would now be coded as:
>
> d.count(key, qty)
> d.appendlist(key, *values)
>
> The error messages returned by the new methods are the same as those
> returned by the existing idioms.
>
> The get() method would continue to exist because it is useful for
> applications other than accumulation.
>
> The setdefault() method would continue to exist but would likely not make
> it into Py3.0.
>
>
> PROBLEMS BEING SOLVED
> ---------------------
>
> The readability issues with the existing constructs are:
>
> * They are awkward to teach, create, read, and review.
> * Their wording tends to hide the real meaning (accumulation).
> * The meaning of setdefault() 's method name is not self-evident.
>
> The performance issues with the existing constructs are:
>
> * They translate into many opcodes which slows them considerably.
> * The get() idiom requires two dictionary lookups of the same key.
> * The setdefault() idiom instantiates a new, empty list prior to every
> call. * That new list is often not needed and is immediately discarded.
> * The setdefault() idiom requires an attribute lookup for extend/append.
> * The setdefault() idiom makes two function calls.
>
> The latter issues are evident from a disassembly:
> >>> dis(compile('d[key] = d.get(key, 0) + qty', '', 'exec'))
>
> 1 0 LOAD_NAME 0 (d)
> 3 LOAD_ATTR 1 (get)
> 6 LOAD_NAME 2 (key)
> 9 LOAD_CONST 0 (0)
> 12 CALL_FUNCTION 2
> 15 LOAD_NAME 3 (qty)
> 18 BINARY_ADD
> 19 LOAD_NAME 0 (d)
> 22 LOAD_NAME 2 (key)
> 25 STORE_SUBSCR
> 26 LOAD_CONST 1 (None)
> 29 RETURN_VALUE
>
> >>> dis(compile('d.setdefault(key, []).extend(values)', '', 'exec'))
>
> 1 0 LOAD_NAME 0 (d)
> 3 LOAD_ATTR 1 (setdefault)
> 6 LOAD_NAME 2 (key)
> 9 BUILD_LIST 0
> 12 CALL_FUNCTION 2
> 15 LOAD_ATTR 3 (extend)
> 18 LOAD_NAME 4 (values)
> 21 CALL_FUNCTION 1
> 24 POP_TOP
> 25 LOAD_CONST 0 (None)
> 28 RETURN_VALUE
>
> In contrast, the proposed methods use only a single attribute lookup and
> function call, they use only one dictionary lookup, they use very few
> opcodes, and they directly access the accumulation functions,
> PyNumber_Add() or PyList_Append(). IOW, the performance improvement
> matches the readability improvement.
>
>
> ISSUES
> ------
>
> The proposed names could possibly be improved (perhaps tally() is more
> active and clear than count()).
>
> The appendlist() method is not as versatile as setdefault() which can be
> used with other object types (perhaps for creating dictionaries of
> dictionaries). However, most uses I've seen are with lists. For other
> uses, plain Python code suffices in terms of speed, clarity, and avoiding
> unnecessary instantiation of empty containers:
>
> if key not in d:
> d.key = {subkey:value}
> else:
> d[key][subkey] = value
>
>
>
> Raymond Hettinger



Relevant Pages

  • Re: comments on my design of a new language?
    ... > exist, plus strings and the function type, and that is it. ... indexed by a string or a regular expression, ... > dictionaries, that terminate with literals, or (possibly ... and lists. ...
    (comp.lang.misc)
  • Re: comments on my design of a new language?
    ... > indexed by a string or a regular expression, ... Trees are essentially nested dictionaries, ... I was originally going to treat scalars and unit lists equivalently, ... > dictionaries being indexed by strings or by regular expressions. ...
    (comp.lang.misc)
  • Re: objects as mutable dictionary keys
    ... for lists, and it isn't defined by default for user ... >> dictionaries recently, and the mainstream opinion was that dictionary ... >> keys must not be mutable, so lists are not allowed as dictionary keys. ... > - Dictionary lookup with list literals would be impossible. ...
    (comp.lang.python)
  • Re: AUE in the dictionaries
    ... OSPD2 lists words of length two to eight letters in their base forms. ... >even when OSPD was official for tournament use, multiple dictionaries were ...
    (alt.usage.english)
  • Re: Why are tuples immutable?
    ... but yet the fact that lists are mutable and tuples ... as dictionaries are one example. ... in the second we just allow mutables to ... > Except that Python uses dicts internally, ...
    (comp.lang.python)

Loading