Pre-PEP: Dictionary accumulator methods

Fri Mar 18 20:24:57 EST 2005

I would like to get everyone's thoughts on two new dictionary methods:

        def count(self, value, qty=1):
            try:
                self[key] += qty
            except KeyError:
                self[key] = qty

        def appendlist(self, key, *values):
            try:
                self[key].extend(values)
            except KeyError:
                self[key] = list(values)

The rationale is to replace the awkward and slow existing idioms for dictionary
based accumulation:

    d[key] = d.get(key, 0) + qty
    d.setdefault(key, []).extend(values)

In simplest form, those two statements would now be coded more readably as:

   d.count(key)
   d.appendlist(key, value)

In their multi-value forms, they would now be coded as:

  d.count(key, qty)
  d.appendlist(key, *values)

The error messages returned by the new methods are the same as those returned by
the existing idioms.

The get() method would continue to exist because it is useful for applications
other than accumulation.

The setdefault() method would continue to exist but would likely not make it
into Py3.0.

PROBLEMS BEING SOLVED
---------------------

The readability issues with the existing constructs are:

* They are awkward to teach, create, read, and review.
* Their wording tends to hide the real meaning (accumulation).
* The meaning of setdefault() 's method name is not self-evident.

The performance issues with the existing constructs are:

* They translate into many opcodes which slows them considerably.
* The get() idiom requires two dictionary lookups of the same key.
* The setdefault() idiom instantiates a new, empty list prior to every call.
* That new list is often not needed and is immediately discarded.
* The setdefault() idiom requires an attribute lookup for extend/append.
* The setdefault() idiom makes two function calls.

The latter issues are evident from a disassembly:

>>> dis(compile('d[key] = d.get(key, 0) + qty', '', 'exec'))
  1           0 LOAD_NAME                0 (d)
              3 LOAD_ATTR                1 (get)
              6 LOAD_NAME                2 (key)
              9 LOAD_CONST               0 (0)
             12 CALL_FUNCTION            2
             15 LOAD_NAME                3 (qty)
             18 BINARY_ADD
             19 LOAD_NAME                0 (d)
             22 LOAD_NAME                2 (key)
             25 STORE_SUBSCR
             26 LOAD_CONST               1 (None)
             29 RETURN_VALUE

>>> dis(compile('d.setdefault(key, []).extend(values)', '', 'exec'))
  1           0 LOAD_NAME                0 (d)
              3 LOAD_ATTR                1 (setdefault)
              6 LOAD_NAME                2 (key)
              9 BUILD_LIST               0
             12 CALL_FUNCTION            2
             15 LOAD_ATTR                3 (extend)
             18 LOAD_NAME                4 (values)
             21 CALL_FUNCTION            1
             24 POP_TOP
             25 LOAD_CONST               0 (None)
             28 RETURN_VALUE

In contrast, the proposed methods use only a single attribute lookup and
function call, they use only one dictionary lookup, they use very few opcodes,
and they directly access the accumulation functions, PyNumber_Add() or
PyList_Append().  IOW, the performance improvement matches the readability
improvement.

ISSUES
------

The proposed names could possibly be improved (perhaps tally() is more active
and clear than count()).

The appendlist() method is not as versatile as setdefault() which can be used
with other object types (perhaps for creating dictionaries of dictionaries).
However, most uses I've seen are with lists.  For other uses, plain Python code
suffices in terms of speed, clarity, and avoiding unnecessary instantiation of
empty containers:

    if key not in d:
        d.key = {subkey:value}
    else:
        d[key][subkey] = value

Raymond Hettinger