[Baypiggies] Notes on defaultdict
Zach Collins
recursive.cookie.jar at gmail.com
Fri Aug 27 14:39:41 CEST 2010
Personally, I'm frequently tempted to just
try:
output[color].append(value)
except:
output[color] = [value]
It's pythonic because I'm a lazy arse :P
On Aug 27, 2010, at 6:00 AM, Brian Harring wrote:
> On Thu, Aug 26, 2010 at 11:16:22PM -0700, Glen Jarvis wrote:
>>>>> for color, value in input:
>>
>> ... if output.has_key(color):
>>
>> ... output[color].append(value)
>>
>> ... else:
>>
>> ... output[color] = [value]
>>
>> ...
>
> Use 'color in output', avoid the has_key... if this is targeted as an
> intro to python, has_key should *not* be mentioned- it's slower than
> doing 'in' due to method lookup and it was punted from py3k. Been
> frowned upon for a long while.
>
> Teaching someone new to python, either you can tell them about
> list.index, dict.has_key, or you can tell them about the 'in'
> operator. Expose new folk to just the latter, they have one rule to
> know- specifically knowing about the containment protocol.
>
> This is generally the pythonic way- who cares if it's a dict or a list
> or a tuple, if it quacks like a duck (does containment), then treat it
> like a duck (it's sequence like).
>
>
>> What a default dictionary type is, it's simply a subclassed version of
>> the dictionary that does a few special things.
>>
>> You give it a basic 'factory function' -- in our case a list -- and
>> that's what's initialized for a key the first time it is accessed. We
>> don't have to worry if the key was previously initialized, etc..
>
> Instantiated if the key is missing, rather than 'first time access'.
> It's pedantic correction, but the wording you're using here conflicts
> with-
>
> d = defaultdict(list)
> d[1].append(2)
> del d[1]
> print d[1]
>
>
>> Some will argue that this is less explicit -- and I would probably have
>> been one of them when I first started preparing this newbie nugget.
>> But, after doing over this a few times to explain it, I've finally
>> started to come on board to the idea. We're explicitly stating the
>> default type when we initialize the defaultdict variable.
>>
>> I especially like it that the default dict gives a default type for any
>> key -- not for a specific key as setdefault seems to do (correct me if
>> I misspeak here).
>
> The phrasing/selling point here is questionable; how it reads to me is
> claiming that all key/vals wind up using that factory function
> (trying to sell the notion of 'default type' specifically), which
> isn't the case. Might want to reword that.
>
> As for setdefault, the phrasing there is further confusing.
> setdefault has no notion of default type... it's purely "if the key
> doesn't exist, use this default".
>
>
>> But, more than anything, it's *much faster* than doing it the other
>> ways.. But, I want you to prove that to yourself and to me… I have
>> created a 1 GB file with colors and numbers. I'll give you this file so
>> that you can read it in and create the dictionary of your choice, with
>> the method of your choice.
>
>
> You really shouldn't be trying to sell defaultdict on it's speed since
> it's not much of a gurantee.
>
> Consider the following:
>
> python -m timeit -n 1000 -s 'from collections import defaultdict;' \
> $'d=defaultdict(lambda :[0])\nfor x in xrange(1000):d[x].append(1)'
> 1000 loops, best of 3: 794 usec per loop
>
> # optimized variant.
> python -m timeit -n 1000 -s 'from collections import defaultdict;from functools import partial' \
> $'d=defaultdict(partial(list, [0]))\nfor x in xrange(1000):d[x].append(1)'
> 1000 loops, best of 3: 723 usec per loop
>
> python -m timeit -n 1000 -s 'pass' \
> $'d={}\nfor x in xrange(1000):d.setdefault(x, [0]).append(1)'
> 1000 loops, best of 3: 546 usec per loop
>
> python -m timeit -n 1000 -s 'pass' \
> $'d={}\nfor x in xrange(1000):\n if x in d:d[x].append(1)\n else:d[x]=[0,1]'
> 1000 loops, best of 3: 282 usec per loop
>
> To be clear, for the scenario of defaultdict(list), defaultdict *is*
> around 20% faster than the optimized dict equivalent (py2.6 timings
> specifically).
>
> Where defaultdict breaks down is when instantiation is semi complex-
> if you have to rely on lambda for example, the frame overhead is going
> to add up pretty quick. In addition, reliance on defaultdict means
> you lose the ability to do site specific faster versions ([0,1] for
> example).
>
> Pretty much, if you're going to sell folk on defaultdict, sell them on
> code cleanliness:
>
> # consider this:
> d = {}
> for key, (subkey, val) in source_items:
> if key in d:
> if subkey in d[key]:
> d[key][subkey].extend(subval)
> else:
> d[key][subkey] = list(subval)
> else:
> d[key] = {subkey:list(subval)}
>
> # simplified...
> d = {}
> for key, (subkey, val) in source_items:
> d.setdefault(key, {}).setdefault(subkey, []).extend(val)
>
> # or using defaultdict
> d = defaultdict(partial(defaultdict, list))
> for key, (subkey, val) in source_items:
> d[key][subkey].extend(val)
>
>
> While that's an abbreviated example, if one imagines the logic being
> a fair bit more complex for assignment/mangling of subkey/val, the
> slightly tricky defaultdict setup used in the last example is offset
> in maintenance costs pretty quickly via not having to dink around
> with setdefault or equivalent logic.
>
> Hell, consider the following-
>
> def f():
> return defaultdict(f)
>
> my_tree = f()
> my_tree[1][2][3][4][5][6][7][8][9]
> my_tree[1][3]
>
> Stuff like that really is the power of defaultdict (more the power of
> __missing__, but you seem to be skipping that detail in your nugget
> thingy-mo-bob).
>
> ~harring
> _______________________________________________
> Baypiggies mailing list
> Baypiggies at python.org
> To change your subscription options or unsubscribe:
> http://mail.python.org/mailman/listinfo/baypiggies
More information about the Baypiggies
mailing list