Please help with MemoryError

Jonathan Gardner jgardner at jonathangardner.net
Thu Feb 11 19:36:45 EST 2010


On Feb 11, 3:39 pm, Jeremy <jlcon... at gmail.com> wrote:
> I have been using Python for several years now and have never run into
> memory errors…
>
> until now.
>

Yes, Python does a good job of making memory errors the least of your
worries as a programmer. Maybe it's doing too good of a job...

> My Python program now consumes over 2 GB of memory and then I get a
> MemoryError.  I know I am reading lots of files into memory, but not
> 2GB worth.

Do a quick calculation: How much are leaving around after you read in
a file? Do you create an object for each line? What does that object
have associated with it? You may find that you have some strange
O(N^2) behavior regarding memory here. Oftentimes people forget that
you have to evaluate how your algorithm will run in time *and* memory.

> I thought I didn't have to worry about memory allocation
> in Python because of the garbage collector.

While it's not the #1 concern, you still have to keep track of how you
are using memory and try not to be wasteful. Use good algorithms, let
things fall out of scope, etc...

> 1.    When I pass a variable to the constructor of a class does it
> copy that variable or is it just a reference/pointer?  I was under the
> impression that it was just a pointer to the data.

For objects, until you make a copy, there is no copy made. That's the
general rule and even though it isn't always correct, it is correct
enough.

> 2.    When do I need to manually allocate/deallocate memory and when
> can I trust Python to take care of it?

Let things fall out of scope. If you're concerned, use delete. Try to
avoid using the global namespace for everything, and try to keep your
lists and dicts small.

> 3.    Any good practice suggestions?
>

Don't read in the entire file and then process it. Try to do line-by-
line processing.

Figure out what your algorithm is doing in terms of time *and* memory.
You likely have some O(N^2) or worse in memory usage.

Don't use Python variables to store data long-term. Instead, setup a
database or a file and use that. I'd first look at using a file, then
using SQLite, and then a full-fledged database like PostgreSQL.

Don't write processes that sit around for a long time unless you also
evaluate whether that process grows in size as it runs. If it does,
you need to figure out why and stop that memory leak.

Simpler code uses less memory. Not just because it is smaller, but
because you are not copying and moving data all over the place. See
what you can do to simplify your code. Maybe you'll expose the nasty
O(N^2) behavior.



More information about the Python-list mailing list