list comprehension help

rkmr.em at gmail.com rkmr.em at gmail.com
Sun Mar 18 23:40:58 EDT 2007


On 18 Mar 2007 19:01:27 -0700, George Sakkis <george.sakkis at gmail.com> wrote:
> On Mar 18, 12:11 pm, "rkmr... at gmail.com" <rkmr... at gmail.com> wrote:
> > I need to process a really huge text file (4GB) and this is what i
> > need to do. It takes for ever to complete this. I read some where that
> > "list comprehension" can fast up things. Can you point out how to do
> > it in this case?
> > thanks a lot!
> >
> > f = open('file.txt','r')
> > for line in f:
> >         db[line.split(' ')[0]] = line.split(' ')[-1]
> >         db.sync()
> You got several good suggestions; one that has not been mentioned but
> makes a big (or even the biggest) difference for large/huge file is
> the buffering parameter of open(). Set it to the largest value you can
> afford to keep the I/O as low as possible. I'm processing 15-25 GB

Can you give example of how you process the 15-25GB files with the
buffering parameter?
It will be educational to everyone I think.

> files (you see "huge" is really relative ;-)) on 2-4GB RAM boxes and
> setting a big buffer (1GB or more) reduces the wall time by 30 to 50%
> compared to the default value. BerkeleyDB should have a buffering
> option too, make sure you use it and don't synchronize on every line.

I changed the sync to once in every 100,000 lines.
thanks  a lot everyone!



More information about the Python-list mailing list