High memory usage - program mistake or Python feature?

Sun May 25 09:12:55 EDT 2003

Ben S wrote:
> Gerald Klix wrote:
> 
>>It is difficult to diagnose if we can't see the whole program.
>>But I suppose the re module makes some (superflous) copies.
>>I would like to see the figures form LoadLogFile alone.
> 
> 
> That is the whole program, pretty much. The rest is just reading a query
> string from an HTTP request so that it can get a filename to pass to
> LoadLogFile. Then there are several lines that use 'for' over the
> results returned from GetLinesContainingCommand, which only returns 20
> or 30 lines at most.
Obviously your program makes three in-memory copies of your log file.
Aahz explained why there must be at least two copies, I still
wonder how you managed to do the third one.
Which python version do you use? Is pymalloc enabled?

> 
> 
>>Some performance hints:
>>You can gain a little by using xreadlines. It does
>>not read the whole file ant once. But if you map
>>the sequence to another sequence you gain almost nothing.
> 
> 
> If I use xreadlines (and drop the string stripping) I suppose I would
> gain a lot on memory but lose out on speed (due to multiple passes over
> the file), right? This isn't necessarily a bad thing since it's running
> as a CGI script, and therefore the bottleneck is usually going to be the
> network, I expect.
You should check if multiple passes are neccessary. Perhaps you can
apply all your regular expressions together at each line or even
better combine all your regular expressions into one.


> 
> Thanks for your other ideas, too.
You should give mmap a try, especially if can combine all your regular
expressions into one.

cya
Gerald