list comprehension help

Sun Mar 18 13:40:25 EDT 2007

rkmr.em at gmail.com <rkmr.em at gmail.com> wrote:

> On 3/18/07, Marc 'BlackJack' Rintsch <bj_666 at gmx.net> wrote:
> > In <mailman.5248.1174235057.32031.python-list at python.org>, Daniel Nogradi
> > wrote:
> >
> > >> f = open('file.txt','r')
> > >> for line in f:
> > >>         db[line.split(' ')[0]] = line.split(' ')[-1]
> > >>         db.sync()
> > >
> > > What is db here? Looks like a dictionary but that doesn't have a sync
> > >method.
> >
> > Shelves (`shelve` module) have this API.  And syncing forces the changes
> > to be written to disks, so all caching and buffering of the operating
> > system is prevented.  So this may slow down the program considerably.
> 
> It is a handle for bsddb
> 
> import bsddb
> db=bsddb.hashopen('db_filename')
> Syncing will defenitely slow down. I will slow that down. But is there
> any improvement I can do to the other part the splitting and setting
> the key value/pair?

Unless each line is huge, how exactly you split it to get the first and
last blank-separated word is not going to matter much.

Still, you should at least avoid repeating the splitting twice, that's
pretty obviously sheer waste: so, change that loop body to:

    words = line.split(' ')
    db[words[0]] = words[-1]

If some lines are huge, splitting them entirely may be far more work
than you need.  In this case, you may do two partial splits instead, one
direct and one reverse:

    first_word = line.split(' ', 1)[0]
    last_word = line.rsplit(' ', 1][-1]
    db[first_word] = last_word

You could also try to extract the first and last words by re or direct
string manipulations, but I doubt that would buy you much, if any,
performance improvement in comparison to the partial-splits.  In the
end, only by "benchmarking" (measuring performance on sample data of
direct relevance to your application) can you find out.

Alex