[Python-Dev] Currently baking idea for dict.sequpdate(iterable, value=True)

Mon, 25 Nov 2002 02:02:55 -0500

Here's a write-up for a proposed dictionary update method.
Comments are invited while I work on a patch, docs, tests, etc.

Raymond Hettinger

------------------------------------------------------------------------
METHOD SPECIFICATION

    def sequpdate(self, iterable, value=True):
        for elem in iterable:
            self[elem] = value
        return self

USE CASES

# Fast Membership testing
termwords = dict('End Quit Stop Abort'.split())
 . . .
if lexeme in termwords: sys.exit(0) 

# Removing duplicates
seq = dict().sequpdate(seq).keys() 

# Initializing or resetting mapped accumulators
absences = dict('Tom Dick Harry'.split(), 0)
for name, date in classlog:
    absences[name] += 1   # Intentionally raises KeyError for invalid names

RATIONALE

Examining code in the library, the cookbook, and the vaults of
parnassus, the most common practice for membership testing is
a sequential search of a list, tuple, or string.  Where the
writers were sophisticated and cared about performance, they
used in-line versions of the python fragment shown above.  I have
not found a single case where people took the time to "roll their
own" pure python function as shown above.  Even if they had, the
construction time would take twice as long as with a C coded method.
IOW, practices are tending toward slow or verbose approaches in the
absence of a quick, convenient, standardized tool for moving sequences
into dictionaries.

The problem of removing duplicates from a sequence is most frequently
solved with inline use of a code fragment similar to the one shown
above.  Again, you almost never see people taking the time to build
their own uniq function, factoring all the effort into a single,
tested, reusable code fragment.  The exception is Tim's wonderful,
general purpose recipe for uniquifying anything -- unfortunately, I
don't think you ever see his code used in practice.

Also, since all containers support __iter__, they can be fed to list();
however, in the absence of the above method, they cannot be fed to dict()
without shenanigans for building item lists.  The proposed method makes
todict() as doable as tolist() and encourages switching to appropriate
data structures.

OBJECTIONS TO THE PRIOR IDEA OF EXPANDING THE DICT CONSTRUCTOR (SF 575224)

1. The constructor was already overloaded.
2. Weird syntax was needed to fit with other constructor syntax.
3. The sets module was going to meet all needs.

The first two comments led to this revision in method form.

The Sets module met several needs centering around set mathematics;
however, for membership testing, it is so slow that it is almost
always preferable to use dictionaries instead (even without this
proposed method).  The slowness is intrinsic because of the time
to search for the __contains__ method in the class and the time
to setup a try/except to handle mutable elements.  Another reason
to prefer dictionaries is that there is one less thing to import
and expect readers to understand.  My experiences applying the
Sets module indicates that it will *never* replace dictionaries for
membership testing and will have only infrequent use for uniquification.