[Python-Dev] defaultdict proposal round three

Ian Bicking ianb at colorstudy.com
Mon Feb 20 22:13:23 CET 2006


Alex Martelli wrote:
>>I prefer this approach over subclassing.  The mental load from an  
>>additional
>>method is less than the load from a separate type (even a  
>>subclass).   Also,
>>avoidance of invariant issues is a big plus.  Besides, if this allows
>>setdefault() to be deprecated, it becomes an all-around win.
> 
> 
> I'd love to remove setdefault in 3.0 -- but I don't think it can be  
> done before that: default_factory won't cover the occasional use  
> cases where setdefault is called with different defaults at different  
> locations, and, rare as those cases may be, any 2.* should not break  
> any existing code that uses that approach.

Would it be deprecated in 2.*, or start deprecating in 3.0?

Also, is default_factory=list threadsafe in the same way .setdefault is? 
  That is, you can safely do this from multiple threads:

   d.setdefault(key, []).append(value)

I believe this is safe with very few caveats -- setdefault itself is 
atomic (or else I'm writing some bad code ;).  My impression is that 
default_factory will not generally be threadsafe in the way setdefault 
is.  For instance:

   def make_list(): return []
   d = dict
   d.default_factory = make_list
   # from multiple threads:
   d.getdef(key).append(value)

This would not be correct (a value can be lost if two threads 
concurrently enter make_list for the same key).  In the case of 
default_factory=list (using the list builtin) is the story different? 
Will this work on Jython, IronPython, or PyPy?  Will this be a 
documented guarantee?  Or alternately, are we just creating a new way to 
punish people who use threads?  And if we push threadsafety up to user 
code, are we trading a very small speed issue (creating lists that are 
thrown away) for a much larger speed issue (acquiring a lock)?

I tried to make a test for this threadsafety, actually -- using a 
technique besides setdefault which I knew was bad (try:except 
KeyError:).  And (except using time.sleep(), which is cheating), I 
wasn't actually able to trigger the bug.  Which is frustrating, because 
I know the bug is there.  So apparently threadsafety is hard to test in 
this case.  (If anyone is interested in trying it, I can email what I have.)

Note that multidict -- among other possible concrete collection patterns 
(like Bag, OrderedDict, or others) -- can be readily implemented with 
threading guarantees.

-- 
Ian Bicking  /  ianb at colorstudy.com  /  http://blog.ianbicking.org


More information about the Python-Dev mailing list