[Baypiggies] Notes on defaultdict

Zach Collins recursive.cookie.jar at gmail.com
Fri Aug 27 14:39:41 CEST 2010


Personally, I'm frequently tempted to just

try:
    output[color].append(value)
except:
    output[color] = [value]

It's pythonic because I'm a lazy arse :P

On Aug 27, 2010, at 6:00 AM, Brian Harring wrote:

> On Thu, Aug 26, 2010 at 11:16:22PM -0700, Glen Jarvis wrote:
>>>>> for color, value in input:
>> 
>>   ...     if output.has_key(color):
>> 
>>   ...         output[color].append(value)
>> 
>>   ...     else:
>> 
>>   ...         output[color] = [value]
>> 
>>   ...
> 
> Use 'color in output', avoid the has_key... if this is targeted as an 
> intro to python, has_key should *not* be mentioned- it's slower than 
> doing 'in' due to method lookup and it was punted from py3k.  Been 
> frowned upon for a long while.
> 
> Teaching someone new to python, either you can tell them about 
> list.index, dict.has_key, or you can tell them about the 'in' 
> operator.  Expose new folk to just the latter, they have one rule to 
> know- specifically knowing about the containment protocol.
> 
> This is generally the pythonic way- who cares if it's a dict or a list 
> or a tuple, if it quacks like a duck (does containment), then treat it 
> like a duck (it's sequence like).
> 
> 
>>   What a default dictionary type is, it's simply a subclassed version of
>>   the dictionary that does a few special things.
>> 
>>   You give it a basic 'factory function' -- in our case a list -- and
>>   that's what's initialized for a key the first time it is accessed. We
>>   don't have to worry if the key was previously initialized, etc..
> 
> Instantiated if the key is missing, rather than 'first time access'.  
> It's pedantic correction, but the wording you're using here conflicts 
> with-
> 
> d = defaultdict(list)
> d[1].append(2)
> del d[1]
> print d[1]
> 
> 
>>   Some will argue that this is less explicit -- and I would probably have
>>   been one of them when I first started preparing this newbie nugget.
>>   But, after doing over this a few times to explain it, I've finally
>>   started to come on board to the idea. We're explicitly stating the
>>   default type when we initialize the defaultdict variable.
>> 
>>   I especially like it that the default dict gives a default type for any
>>   key -- not for a specific key as setdefault seems to do (correct me if
>>   I misspeak here).
> 
> The phrasing/selling point here is questionable; how it reads to me is 
> claiming that all key/vals wind up using that factory function 
> (trying to sell the notion of 'default type' specifically), which 
> isn't the case.  Might want to reword that.
> 
> As for setdefault, the phrasing there is further confusing.  
> setdefault has no notion of default type... it's purely "if the key 
> doesn't exist, use this default".
> 
> 
>>   But, more than anything, it's *much faster* than doing it the other
>>   ways.. But, I want you to prove that to yourself and to me… I have
>>   created a 1 GB file with colors and numbers. I'll give you this file so
>>   that you can read it in and create the dictionary of your choice, with
>>   the method of your choice.
> 
> 
> You really shouldn't be trying to sell defaultdict on it's speed since 
> it's not much of a gurantee.
> 
> Consider the following:
> 
> python -m timeit -n 1000 -s 'from collections import defaultdict;' \
> $'d=defaultdict(lambda :[0])\nfor x in xrange(1000):d[x].append(1)'
> 1000 loops, best of 3: 794 usec per loop
> 
> # optimized variant.
> python -m timeit -n 1000 -s 'from collections import defaultdict;from functools import partial' \
> $'d=defaultdict(partial(list, [0]))\nfor x in xrange(1000):d[x].append(1)'
> 1000 loops, best of 3: 723 usec per loop
> 
> python -m timeit -n 1000 -s 'pass' \
>  $'d={}\nfor x in xrange(1000):d.setdefault(x, [0]).append(1)'
> 1000 loops, best of 3: 546 usec per loop
> 
> python -m timeit -n 1000 -s 'pass' \
> $'d={}\nfor x in xrange(1000):\n if x in d:d[x].append(1)\n else:d[x]=[0,1]'
> 1000 loops, best of 3: 282 usec per loop
> 
> To be clear, for the scenario of defaultdict(list), defaultdict *is* 
> around 20% faster than the optimized dict equivalent (py2.6 timings 
> specifically).
> 
> Where defaultdict breaks down is when instantiation is semi complex- 
> if you have to rely on lambda for example, the frame overhead is going 
> to add up pretty quick.   In addition, reliance on defaultdict means 
> you lose the ability to do site specific faster versions ([0,1] for 
> example).
> 
> Pretty much, if you're going to sell folk on defaultdict, sell them on 
> code cleanliness:
> 
> # consider this:
> d = {}
> for key, (subkey, val) in source_items:
> if key in d:
>  if subkey in d[key]:
>   d[key][subkey].extend(subval)
>  else:
>   d[key][subkey] = list(subval)
> else:
>  d[key] = {subkey:list(subval)}
> 
> # simplified...
> d = {}
> for key, (subkey, val) in source_items:
> d.setdefault(key, {}).setdefault(subkey, []).extend(val)
> 
> # or using defaultdict
> d = defaultdict(partial(defaultdict, list))
> for key, (subkey, val) in source_items:
> d[key][subkey].extend(val)
> 
> 
> While that's an abbreviated example, if one imagines the logic being 
> a fair bit more complex for assignment/mangling of subkey/val, the 
> slightly tricky defaultdict setup used in the last example is offset 
> in maintenance costs pretty quickly via not having to dink around 
> with setdefault or equivalent logic.
> 
> Hell, consider the following-
> 
> def f():
> return defaultdict(f)
> 
> my_tree = f()
> my_tree[1][2][3][4][5][6][7][8][9]
> my_tree[1][3]
> 
> Stuff like that really is the power of defaultdict (more the power of 
> __missing__, but you seem to be skipping that detail in your nugget 
> thingy-mo-bob).
> 
> ~harring
> _______________________________________________
> Baypiggies mailing list
> Baypiggies at python.org
> To change your subscription options or unsubscribe:
> http://mail.python.org/mailman/listinfo/baypiggies



More information about the Baypiggies mailing list