High memory usage - program mistake or Python feature?

Aahz aahz at pythoncraft.com
Fri May 23 09:20:10 EDT 2003


In article <bal3vd$7i3$1 at newsg4.svr.pol.co.uk>,
Ben S <bens at replytothegroupplease.com> wrote:
>
>def LoadLogFile(filename):
>    """Loads a log file as a collection of lines"""
>    try:
>        logFile = file(filename, 'rU')
>        lines = map(string.strip, logFile.readlines())
>    except IOError:
>        return False
>    return lines
>
>The 'problem' was that, when operating on a 50MB file, the memory usage
>(according to ps on Linux) rocketed to just over 150MB. Since there's no
>other significant storage in the script, I can only assume that the
>lines (corresponding to strings of between 40 and 90 ASCII characters)
>are being stored in such a way that their size is inflated to 3x their
>usual size. I've not specified any Unicode usage anywhere, nor does the
>text file in question use any characters above 127, as far as I know.
>The GetLinesContainingCommand function returns a tiny subset (no more
>than 20 or 30 lines out of tens of thousands) so I doubt it's that
>causing the problem.

The part about high-bit characters is a complete red herring.  Note that
you've got the file in memory twice: once from readlines() and once from
map().  Although the readlines() call only delivers a temporary object,
the memory consumed doesn't get returned to the OS.  If you care about
memory usage, never slurp an entire file into memory.
-- 
Aahz (aahz at pythoncraft.com)           <*>         http://www.pythoncraft.com/

"In many ways, it's a dull language, borrowing solid old concepts from
many other languages & styles:  boring syntax, unsurprising semantics,
few automatic coercions, etc etc.  But that's one of the things I like
about it."  --Tim Peters on Python, 16 Sep 93




More information about the Python-list mailing list