list comprehension help
Alex Martelli
aleax at mac.com
Sun Mar 18 13:40:25 EDT 2007
rkmr.em at gmail.com <rkmr.em at gmail.com> wrote:
> On 3/18/07, Marc 'BlackJack' Rintsch <bj_666 at gmx.net> wrote:
> > In <mailman.5248.1174235057.32031.python-list at python.org>, Daniel Nogradi
> > wrote:
> >
> > >> f = open('file.txt','r')
> > >> for line in f:
> > >> db[line.split(' ')[0]] = line.split(' ')[-1]
> > >> db.sync()
> > >
> > > What is db here? Looks like a dictionary but that doesn't have a sync
> > >method.
> >
> > Shelves (`shelve` module) have this API. And syncing forces the changes
> > to be written to disks, so all caching and buffering of the operating
> > system is prevented. So this may slow down the program considerably.
>
> It is a handle for bsddb
>
> import bsddb
> db=bsddb.hashopen('db_filename')
> Syncing will defenitely slow down. I will slow that down. But is there
> any improvement I can do to the other part the splitting and setting
> the key value/pair?
Unless each line is huge, how exactly you split it to get the first and
last blank-separated word is not going to matter much.
Still, you should at least avoid repeating the splitting twice, that's
pretty obviously sheer waste: so, change that loop body to:
words = line.split(' ')
db[words[0]] = words[-1]
If some lines are huge, splitting them entirely may be far more work
than you need. In this case, you may do two partial splits instead, one
direct and one reverse:
first_word = line.split(' ', 1)[0]
last_word = line.rsplit(' ', 1][-1]
db[first_word] = last_word
You could also try to extract the first and last words by re or direct
string manipulations, but I doubt that would buy you much, if any,
performance improvement in comparison to the partial-splits. In the
end, only by "benchmarking" (measuring performance on sample data of
direct relevance to your application) can you find out.
Alex
More information about the Python-list
mailing list