list comprehension help

Alex Martelli aleax at mac.com
Sun Mar 18 19:52:38 EDT 2007


George Sakkis <george.sakkis at gmail.com> wrote:
   ...
> > Unless each line is huge, how exactly you split it to get the first and
> > last blank-separated word is not going to matter much.
> >
> > Still, you should at least avoid repeating the splitting twice, that's
> > pretty obviously sheer waste: so, change that loop body to:
> >
> >     words = line.split(' ')
> >     db[words[0]] = words[-1]
> >
> > If some lines are huge, splitting them entirely may be far more work
> > than you need.  In this case, you may do two partial splits instead, one
> > direct and one reverse:
> >
> >     first_word = line.split(' ', 1)[0]
> >     last_word = line.rsplit(' ', 1][-1]
> >     db[first_word] = last_word
> 
> I'd guess the following is in theory faster, though it might not make
> a measurable difference:
> 
>     first_word = line[:line.index(' ')]
>     last_word = line[line.rindex(' ')+1:]
>     db[first_word] = last_word

If the lines are huge, the difference is quite measurable:

brain:~ alex$ python -mtimeit -s"line='ciao '*999" "first=line.split('
',1)[0]; line=line.rstrip(); second=line.rsplit(' ',1)[-1]"
100000 loops, best of 3: 3.95 usec per loop

brain:~ alex$ python -mtimeit -s"line='ciao '*999"
"first=line[:line.index(' ')]; line=line.rstrip();
second=line[line.rindex(' ')+1:]"

1000000 loops, best of 3: 1.62 usec per loop

brain:~ alex$ 

So, if the 4GB file was made up, say, of 859853 such lines, using the
index/rindex approach might save a couple of seconds overall.

The lack of ,1 in the split/rsplit calls (i.e., essentially, the code
originally posted) brings the snippet time to 226 microseconds; here,
the speedup might therefore be of a couple HUNDRED seconds in all.


Alex



More information about the Python-list mailing list