writing from file into dictionary

sismex01 at hebmex.com sismex01 at hebmex.com
Mon Nov 11 10:39:05 EST 2002


> From: Alex Martelli [mailto:aleax at aleax.it]
> 
> On Monday 11 November 2002 01:15 pm, Brad Hards wrote:
> > On Mon, 11 Nov 2002 22:16, Alex Martelli wrote:
> > <snip>
> >
> > > Assuming the trailing colon at the end of the 'rewrite' part
> > > (and anything following it, such as the comment in the last
> > > line) is to be removed; and that corresponding to each head
> > > you want a list of the possible rewrites for that head;
> >
> > Can I try an explanation?
> 
> Yes.  Python 1.5.2 idioms equivalent to this statement include:
> 
> if not grammardict.has_key(head):
>     grammardict[head] = []
> grammardict[head].append(rewrite)
> 
>

I feel a better way than the previous snip would be:

   try:
      grammardict[head].append(rewrite)
   except KeyError:
      grammardict[head] = [ rewrite ]

This would have the advantage of only indexing 'head' twice in the
worst case --when it's not present in the dictionary--, and only
one time, once it's been inserted.

I haven't done any timing, anybody has?  It just "feels" cleaner.
*sigh* Python's made me actually *seek* exception handlers,
so much for staying away from then, back in my c++ days.

Good luck y'all.


OK OK OK OK OK

Do I did a bit of trivial timing on a few functions which implement
checking/adding to a dictionary using setdefault, has_key or try/except,
and the results surprise me somewhat.  I'm using Py 2.2, just in
case things have changed.

The functions take two lists as arguments: the list of dictionary
keys, and a list of items which are appended to lists which
are inserted in the dictionary at those keys.  It's just two
for loops, the outer iterating over the keys, the inner loop
over the items.

Using keys=range(100), items=range(10):
   test_setdefault() --> 8.562000 ms/call
   test_haskey()     --> 6.700000 ms/call
   test_except()     --> 8.592000 ms/call

huh!  says me.  I thought setdefault() was more efficient
than this.  Maybe it's the keys/items ratio, let's check:

Using keys=range(1000), items=range(1):
   test_setdefault() --> 13.259000 ms/call
   test_haskey()     --> 13.509000 ms/call
   test_except()     --> 41.190000 ms/call

ouch, again.  Let's switch the ratio over to the other side:

Using keys=range(10), items=range(1000)
   test_setdefault() --> 8.201000 ms/call
   test_haskey()     --> 6.029000 ms/call
   test_except()     --> 4.587000 ms/call

Ah! Vindicated! Let's make it a bit more extreme:

Using keys=range(2), items=range(500):
   test_setdefault() --> 8.752000 ms/call
   test_haskey()     --> 6.259000 ms/call
   test_except()     --> 4.557000 ms/call


Interesting; I hope this clears things up a bit, it did
for me, a lot :-)

-gustavo

pd: Have a nice monday :-)




More information about the Python-list mailing list