list comprehension help

Daniel Nogradi nogradi at gmail.com
Sun Mar 18 13:12:25 EDT 2007


> > > I need to process a really huge text file (4GB) and this is what i
> > > "list comprehension" can fast up things. Can you point out how to do
> > > f = open('file.txt','r')
> > > for line in f:
> > >         db[line.split(' ')[0]] = line.split(' ')[-1]
> > >         db.sync()
> >
> > What is db here? Looks like a dictionary but that doesn't have a sync method.
>
> db is a handle for Berkely db that i open with import bsddb
>
> import bsddb
> db=bsddb.hashopen('db_filename')
>
> > If the file is 4GB are you sure you want to store the whole thing into
> > memory?
>
> I dont want to load it in memory. Once I call the sync() function it
> get synced to the disk, and it is not loaded completely.
>
> > use list comprehension like this:
> > db = [ line.split(' ')[-1] for line in open('file.txt','r') ]
> > or
> > db = [ ( line.split(' ')[0], line.split(' ')[-1] ) for line in
> > open('file.txt','r') ]
> >
> > depending on what exactly you want to store.
>
> line.split(' ')[0] is the key and line.split(' ')[-1] is the value.
> THat is what I want to store.
> Will the second line comprehension work in this case?

No, I don't think so. I gave that example because you wanted to use
list comprehension and I didn't know what db is but as the name says
it is for lists. Since the bsddb handle is not a list I'm afraid you
won't be able to use list comprehension.

This small thing will speed up your loop but since syncing is IO bound
it won't give you much (the timeit module will tell you how much you
save, if at all):

for line in f:
    sp = line.split
    db[sp(' ')[0]] = sp(' ')[-1]
    db.sync( )

HTH,
Daniel



More information about the Python-list mailing list