[Baypiggies] Notes on defaultdict

Glen Jarvis glen at glenjarvis.com
Fri Aug 27 18:23:13 CEST 2010


Brian,
    This was a freakin' awesome critical analysis. I didn't know about the
'has_key' being punted. As I have seen the 'in' used (and liked it), I
didn't realize one was preferred over the other.  I agree that the 'in' is a
much more pythonic way to do this. (I didn't realize it was faster too).
 Very good call.

     When I did the decorator newbie nugget, I also got some *really*
valuable feedback -- for example -- simply don't use "decorator.py" as my
example to import from because it confused everyone when I did "from
decorator import ..." -- they thought it was a built in library. I hadn't
even thought of it, but that was *great* advise and a great learning
experience for me.

     I'm getting the best end of the deal where, when I present something
like this, I get incredible feedback and learn more about it than I ever
would have by just reading on my own...


KUDOS to all and thanks for the feedback. Hopefully this has also been
helpful to everyone else.... It certainly has been for me...

Cheers,


Glen

On Fri, Aug 27, 2010 at 4:00 AM, Brian Harring <ferringb at gmail.com> wrote:

> On Thu, Aug 26, 2010 at 11:16:22PM -0700, Glen Jarvis wrote:
> >    >>> for color, value in input:
> >
> >    ...     if output.has_key(color):
> >
> >    ...         output[color].append(value)
> >
> >    ...     else:
> >
> >    ...         output[color] = [value]
> >
> >    ...
>
> Use 'color in output', avoid the has_key... if this is targeted as an
> intro to python, has_key should *not* be mentioned- it's slower than
> doing 'in' due to method lookup and it was punted from py3k.  Been
> frowned upon for a long while.
>
> Teaching someone new to python, either you can tell them about
> list.index, dict.has_key, or you can tell them about the 'in'
> operator.  Expose new folk to just the latter, they have one rule to
> know- specifically knowing about the containment protocol.
>
> This is generally the pythonic way- who cares if it's a dict or a list
> or a tuple, if it quacks like a duck (does containment), then treat it
> like a duck (it's sequence like).
>
>
> >    What a default dictionary type is, it's simply a subclassed version of
> >    the dictionary that does a few special things.
> >
> >    You give it a basic 'factory function' -- in our case a list -- and
> >    that's what's initialized for a key the first time it is accessed. We
> >    don't have to worry if the key was previously initialized, etc..
>
> Instantiated if the key is missing, rather than 'first time access'.
> It's pedantic correction, but the wording you're using here conflicts
> with-
>
> d = defaultdict(list)
> d[1].append(2)
> del d[1]
> print d[1]
>
>
> >    Some will argue that this is less explicit -- and I would probably
> have
> >    been one of them when I first started preparing this newbie nugget.
> >    But, after doing over this a few times to explain it, I've finally
> >    started to come on board to the idea. We're explicitly stating the
> >    default type when we initialize the defaultdict variable.
> >
> >    I especially like it that the default dict gives a default type for
> any
> >    key -- not for a specific key as setdefault seems to do (correct me if
> >    I misspeak here).
>
> The phrasing/selling point here is questionable; how it reads to me is
> claiming that all key/vals wind up using that factory function
> (trying to sell the notion of 'default type' specifically), which
> isn't the case.  Might want to reword that.
>
> As for setdefault, the phrasing there is further confusing.
> setdefault has no notion of default type... it's purely "if the key
> doesn't exist, use this default".
>
>
> >    But, more than anything, it's *much faster* than doing it the other
> >    ways.. But, I want you to prove that to yourself and to me… I have
> >    created a 1 GB file with colors and numbers. I'll give you this file
> so
> >    that you can read it in and create the dictionary of your choice, with
> >    the method of your choice.
>
>
> You really shouldn't be trying to sell defaultdict on it's speed since
> it's not much of a gurantee.
>
> Consider the following:
>
> python -m timeit -n 1000 -s 'from collections import defaultdict;' \
>  $'d=defaultdict(lambda :[0])\nfor x in xrange(1000):d[x].append(1)'
> 1000 loops, best of 3: 794 usec per loop
>
> # optimized variant.
> python -m timeit -n 1000 -s 'from collections import defaultdict;from
> functools import partial' \
>  $'d=defaultdict(partial(list, [0]))\nfor x in xrange(1000):d[x].append(1)'
> 1000 loops, best of 3: 723 usec per loop
>
> python -m timeit -n 1000 -s 'pass' \
>  $'d={}\nfor x in xrange(1000):d.setdefault(x, [0]).append(1)'
> 1000 loops, best of 3: 546 usec per loop
>
> python -m timeit -n 1000 -s 'pass' \
>  $'d={}\nfor x in xrange(1000):\n if x in d:d[x].append(1)\n
> else:d[x]=[0,1]'
> 1000 loops, best of 3: 282 usec per loop
>
> To be clear, for the scenario of defaultdict(list), defaultdict *is*
> around 20% faster than the optimized dict equivalent (py2.6 timings
> specifically).
>
> Where defaultdict breaks down is when instantiation is semi complex-
> if you have to rely on lambda for example, the frame overhead is going
> to add up pretty quick.   In addition, reliance on defaultdict means
> you lose the ability to do site specific faster versions ([0,1] for
> example).
>
> Pretty much, if you're going to sell folk on defaultdict, sell them on
> code cleanliness:
>
> # consider this:
> d = {}
> for key, (subkey, val) in source_items:
>  if key in d:
>  if subkey in d[key]:
>   d[key][subkey].extend(subval)
>  else:
>   d[key][subkey] = list(subval)
>  else:
>  d[key] = {subkey:list(subval)}
>
> # simplified...
> d = {}
> for key, (subkey, val) in source_items:
>  d.setdefault(key, {}).setdefault(subkey, []).extend(val)
>
> # or using defaultdict
> d = defaultdict(partial(defaultdict, list))
> for key, (subkey, val) in source_items:
>  d[key][subkey].extend(val)
>
>
> While that's an abbreviated example, if one imagines the logic being
> a fair bit more complex for assignment/mangling of subkey/val, the
> slightly tricky defaultdict setup used in the last example is offset
> in maintenance costs pretty quickly via not having to dink around
> with setdefault or equivalent logic.
>
> Hell, consider the following-
>
> def f():
>  return defaultdict(f)
>
> my_tree = f()
> my_tree[1][2][3][4][5][6][7][8][9]
> my_tree[1][3]
>
> Stuff like that really is the power of defaultdict (more the power of
> __missing__, but you seem to be skipping that detail in your nugget
> thingy-mo-bob).
>
> ~harring
>



-- 
Whatever you can do or imagine, begin it;
boldness has beauty, magic, and power in it.

-- Goethe
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/baypiggies/attachments/20100827/efa09f44/attachment-0001.html>


More information about the Baypiggies mailing list