[Baypiggies] Notes on defaultdict

Glen Jarvis glen at glenjarvis.com
Fri Aug 27 18:30:13 CEST 2010


Zach,
   I hear ya :)  The if color in output: blah; else: blah  is the simplest
way -- and the way I was most comfortable with until trying to put this
newbie nugget together...

  There are a couple of things in the try/except that being lazy will get
you in trouble -- the most notable, in my opinion, is the 'except' on any
condition.

    What if, for whatever reason, the output[color].append(value) takes a
very long time (minutes) for some strange scenario that we can cook up?   If
you want to interrupt it during this time, no Keyboard Interruption will be
excepted as all exceptions are caught....

    This is also against the PEP-8 coding standard:

 - When catching exceptions, mention specific exceptions
      whenever possible instead of using a bare 'except:' clause.

      For example, use:

          try:
              import platform_specific_module
          except ImportError:
              platform_specific_module = None

      A bare 'except:' clause will catch SystemExit and
KeyboardInterrupt exceptions, making it harder to interrupt a program
with Control-C, and can disguise other problems. If you want to catch
all exceptions that signal program errors, use 'except Exception:'.


2010/8/27 Zach Collins <recursive.cookie.jar at gmail.com>

> Personally, I'm frequently tempted to just
>
> try:
>    output[color].append(value)
> except:
>    output[color] = [value]
>
> It's pythonic because I'm a lazy arse :P
>
> On Aug 27, 2010, at 6:00 AM, Brian Harring wrote:
>
> > On Thu, Aug 26, 2010 at 11:16:22PM -0700, Glen Jarvis wrote:
> >>>>> for color, value in input:
> >>
> >>   ...     if output.has_key(color):
> >>
> >>   ...         output[color].append(value)
> >>
> >>   ...     else:
> >>
> >>   ...         output[color] = [value]
> >>
> >>   ...
> >
> > Use 'color in output', avoid the has_key... if this is targeted as an
> > intro to python, has_key should *not* be mentioned- it's slower than
> > doing 'in' due to method lookup and it was punted from py3k.  Been
> > frowned upon for a long while.
> >
> > Teaching someone new to python, either you can tell them about
> > list.index, dict.has_key, or you can tell them about the 'in'
> > operator.  Expose new folk to just the latter, they have one rule to
> > know- specifically knowing about the containment protocol.
> >
> > This is generally the pythonic way- who cares if it's a dict or a list
> > or a tuple, if it quacks like a duck (does containment), then treat it
> > like a duck (it's sequence like).
> >
> >
> >>   What a default dictionary type is, it's simply a subclassed version of
> >>   the dictionary that does a few special things.
> >>
> >>   You give it a basic 'factory function' -- in our case a list -- and
> >>   that's what's initialized for a key the first time it is accessed. We
> >>   don't have to worry if the key was previously initialized, etc..
> >
> > Instantiated if the key is missing, rather than 'first time access'.
> > It's pedantic correction, but the wording you're using here conflicts
> > with-
> >
> > d = defaultdict(list)
> > d[1].append(2)
> > del d[1]
> > print d[1]
> >
> >
> >>   Some will argue that this is less explicit -- and I would probably
> have
> >>   been one of them when I first started preparing this newbie nugget.
> >>   But, after doing over this a few times to explain it, I've finally
> >>   started to come on board to the idea. We're explicitly stating the
> >>   default type when we initialize the defaultdict variable.
> >>
> >>   I especially like it that the default dict gives a default type for
> any
> >>   key -- not for a specific key as setdefault seems to do (correct me if
> >>   I misspeak here).
> >
> > The phrasing/selling point here is questionable; how it reads to me is
> > claiming that all key/vals wind up using that factory function
> > (trying to sell the notion of 'default type' specifically), which
> > isn't the case.  Might want to reword that.
> >
> > As for setdefault, the phrasing there is further confusing.
> > setdefault has no notion of default type... it's purely "if the key
> > doesn't exist, use this default".
> >
> >
> >>   But, more than anything, it's *much faster* than doing it the other
> >>   ways.. But, I want you to prove that to yourself and to me… I have
> >>   created a 1 GB file with colors and numbers. I'll give you this file
> so
> >>   that you can read it in and create the dictionary of your choice, with
> >>   the method of your choice.
> >
> >
> > You really shouldn't be trying to sell defaultdict on it's speed since
> > it's not much of a gurantee.
> >
> > Consider the following:
> >
> > python -m timeit -n 1000 -s 'from collections import defaultdict;' \
> > $'d=defaultdict(lambda :[0])\nfor x in xrange(1000):d[x].append(1)'
> > 1000 loops, best of 3: 794 usec per loop
> >
> > # optimized variant.
> > python -m timeit -n 1000 -s 'from collections import defaultdict;from
> functools import partial' \
> > $'d=defaultdict(partial(list, [0]))\nfor x in
> xrange(1000):d[x].append(1)'
> > 1000 loops, best of 3: 723 usec per loop
> >
> > python -m timeit -n 1000 -s 'pass' \
> >  $'d={}\nfor x in xrange(1000):d.setdefault(x, [0]).append(1)'
> > 1000 loops, best of 3: 546 usec per loop
> >
> > python -m timeit -n 1000 -s 'pass' \
> > $'d={}\nfor x in xrange(1000):\n if x in d:d[x].append(1)\n
> else:d[x]=[0,1]'
> > 1000 loops, best of 3: 282 usec per loop
> >
> > To be clear, for the scenario of defaultdict(list), defaultdict *is*
> > around 20% faster than the optimized dict equivalent (py2.6 timings
> > specifically).
> >
> > Where defaultdict breaks down is when instantiation is semi complex-
> > if you have to rely on lambda for example, the frame overhead is going
> > to add up pretty quick.   In addition, reliance on defaultdict means
> > you lose the ability to do site specific faster versions ([0,1] for
> > example).
> >
> > Pretty much, if you're going to sell folk on defaultdict, sell them on
> > code cleanliness:
> >
> > # consider this:
> > d = {}
> > for key, (subkey, val) in source_items:
> > if key in d:
> >  if subkey in d[key]:
> >   d[key][subkey].extend(subval)
> >  else:
> >   d[key][subkey] = list(subval)
> > else:
> >  d[key] = {subkey:list(subval)}
> >
> > # simplified...
> > d = {}
> > for key, (subkey, val) in source_items:
> > d.setdefault(key, {}).setdefault(subkey, []).extend(val)
> >
> > # or using defaultdict
> > d = defaultdict(partial(defaultdict, list))
> > for key, (subkey, val) in source_items:
> > d[key][subkey].extend(val)
> >
> >
> > While that's an abbreviated example, if one imagines the logic being
> > a fair bit more complex for assignment/mangling of subkey/val, the
> > slightly tricky defaultdict setup used in the last example is offset
> > in maintenance costs pretty quickly via not having to dink around
> > with setdefault or equivalent logic.
> >
> > Hell, consider the following-
> >
> > def f():
> > return defaultdict(f)
> >
> > my_tree = f()
> > my_tree[1][2][3][4][5][6][7][8][9]
> > my_tree[1][3]
> >
> > Stuff like that really is the power of defaultdict (more the power of
> > __missing__, but you seem to be skipping that detail in your nugget
> > thingy-mo-bob).
> >
> > ~harring
> > _______________________________________________
> > Baypiggies mailing list
> > Baypiggies at python.org
> > To change your subscription options or unsubscribe:
> > http://mail.python.org/mailman/listinfo/baypiggies
>
>


-- 
Whatever you can do or imagine, begin it;
boldness has beauty, magic, and power in it.

-- Goethe
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/baypiggies/attachments/20100827/49c37b0e/attachment.html>


More information about the Baypiggies mailing list