list comprehension help

George Sakkis george.sakkis at gmail.com
Sun Mar 18 19:35:28 EDT 2007


On Mar 18, 1:40 pm, a... at mac.com (Alex Martelli) wrote:
> rkmr... at gmail.com <rkmr... at gmail.com> wrote:
> > On 3/18/07, Marc 'BlackJack' Rintsch <bj_... at gmx.net> wrote:
> > > In <mailman.5248.1174235057.32031.python-l... at python.org>, Daniel Nogradi
> > > wrote:
>
> > > >> f = open('file.txt','r')
> > > >> for line in f:
> > > >>         db[line.split(' ')[0]] = line.split(' ')[-1]
> > > >>         db.sync()
>
> > > > What is db here? Looks like a dictionary but that doesn't have a sync
> > > >method.
>
> > > Shelves (`shelve` module) have this API.  And syncing forces the changes
> > > to be written to disks, so all caching and buffering of the operating
> > > system is prevented.  So this may slow down the program considerably.
>
> > It is a handle for bsddb
>
> > import bsddb
> > db=bsddb.hashopen('db_filename')
> > Syncing will defenitely slow down. I will slow that down. But is there
> > any improvement I can do to the other part the splitting and setting
> > the key value/pair?
>
> Unless each line is huge, how exactly you split it to get the first and
> last blank-separated word is not going to matter much.
>
> Still, you should at least avoid repeating the splitting twice, that's
> pretty obviously sheer waste: so, change that loop body to:
>
>     words = line.split(' ')
>     db[words[0]] = words[-1]
>
> If some lines are huge, splitting them entirely may be far more work
> than you need.  In this case, you may do two partial splits instead, one
> direct and one reverse:
>
>     first_word = line.split(' ', 1)[0]
>     last_word = line.rsplit(' ', 1][-1]
>     db[first_word] = last_word

I'd guess the following is in theory faster, though it might not make
a measurable difference:

    first_word = line[:line.index(' ')]
    last_word = line[line.rindex(' ')+1:]
    db[first_word] = last_word

By the way, a gotcha is that the file iterator yields lines that
retain the newline character; you have to strip it off if you don't
want it, either with .rstrip('\n') or (at least on *n*x) omit the last
character:
    last_word = line[line.rindex(' ')+1 : -1]

George




More information about the Python-list mailing list