[Python-Dev] Proposal: defaultdict

"Martin v. Löwis" martin at v.loewis.de
Sat Feb 18 08:33:35 CET 2006


Ian Bicking wrote:
> Well, here's a kind of an example: WSGI specifies that the environment
> must be a dictionary, and nothing but a dictionary.  I think it would
> have to be updated to say that it must be a dictionary with
> default_factory not set, as default_factory would break the
> predictability that was the reason WSGI specified exactly a dictionary
> (and not a dictionary-like object or subclass).  So there's something
> that becomes brokenish.

I don't understand. In the rationale of PEP 333, it says
"The rationale for requiring a dictionary is to maximize portability
between servers. The alternative would be to define some subset of a
dictionary's methods as being the standard and portable interface."

That rationale is not endangered: if the environment continues to
be a dict exactly, servers continue to be guaranteed what precise
set of operations is available on the environment.

Of course, that may change from Python version to Python version,
as new dict methods get defined. But that should have been clear
when the PEP was written: the dict type itself may evolve, providing
additional features that weren't present in earlier versions.
Even now, some dict implementations have setdefault(), others
don't.

> KeyError is one of
> those errors that you *expect* to happen (maybe the "Error" part is a
> misnomer); having it disappear is a major change.

Well, as you say: you get a KeyError if there is an error with the key.
With a default_factory, there isn't normally an error with the key.

> Also, I believe there's two ways to handle thread safety, both of which
> are broken:
> 
> 1) d[key] gets the GIL, and thus while default_factory is being called
> the GIL is locked
> 
> 2) d[key] doesn't get the GIL and so d[key].append(1) may not actually
> lead to 1 being in d[key] if another thread is appending something to
> the same key at the same time, and the key is not yet present in d.

It's 1), primarily. If default_factory is written in Python, though
(e.g. if it is *not* list()), the interpreter will give up the GIL
every N byte code instructions (or when a blocking operation is
executed).

Notice the same issue already exist with __hash__ for the key.

Also notice that the same issue already exists with any kind of
manipulation of a dictionary in multiple threads, today: if you
do

try:
   d[k].append(v)
except KeyError:
   d[k] = [v]

then two threads might interleavingly execute the except-suite.

Regards,
Martin


More information about the Python-Dev mailing list