list comprehension help

Mon Mar 19 01:11:44 EDT 2007

rkmr.em at gmail.com <rkmr.em at gmail.com> wrote:
   ...
> > > files (you see "huge" is really relative ;-)) on 2-4GB RAM boxes and
> > > setting a big buffer (1GB or more) reduces the wall time by 30 to 50%
> > > compared to the default value. BerkeleyDB should have a buffering
> > Out of curiosity, what OS and FS are you using?  On a well-tuned FS and
> 
> Fedora Core 4 and ext 3. Is there something I should do to the FS?

In theory, nothing.  In practice, this is strange.

> Which should I do? How much buffer should I allocate? I have a box
> with 2GB memory.

I'd be curious to see a read-only loop on the file, opened with (say)
1MB of buffer vs 30MB vs 1GB -- just loop on the lines, do a .split() on
each, and do nothing with the results.  What elapsed times do you
measure with each buffersize...?

If the huge buffers confirm their worth, it's time to take a nice
critical look at what other processes you're running and what all are
they doing to your disk -- maybe some daemon (or frequently-run cron
entry, etc) is out of control...?  You could try running the benchmark
again in single-user mode (with essentially nothing else running) and
see how the elapsed-time measurements change...

Alex