list comprehension help

George Sakkis george.sakkis at gmail.com
Sun Mar 18 22:01:27 EDT 2007


On Mar 18, 12:11 pm, "rkmr... at gmail.com" <rkmr... at gmail.com> wrote:

> Hi
> I need to process a really huge text file (4GB) and this is what i
> need to do. It takes for ever to complete this. I read some where that
> "list comprehension" can fast up things. Can you point out how to do
> it in this case?
> thanks a lot!
>
> f = open('file.txt','r')
> for line in f:
>         db[line.split(' ')[0]] = line.split(' ')[-1]
>         db.sync()

You got several good suggestions; one that has not been mentioned but
makes a big (or even the biggest) difference for large/huge file is
the buffering parameter of open(). Set it to the largest value you can
afford to keep the I/O as low as possible. I'm processing 15-25 GB
files (you see "huge" is really relative ;-)) on 2-4GB RAM boxes and
setting a big buffer (1GB or more) reduces the wall time by 30 to 50%
compared to the default value. BerkeleyDB should have a buffering
option too, make sure you use it and don't synchronize on every line.

Best,
George




More information about the Python-list mailing list