memory problem with list creation

Steven D'Aprano steven at REMOVE.THIS.cybersource.com.au
Wed Jan 13 21:03:52 EST 2010


On Wed, 13 Jan 2010 06:24:04 -0800, Allard Warrink wrote:

> Within a python script I'm using a couple of different lists containing
> a large number of floats (+8M). The execution of this script fails
> because of an memory error (insufficient memory). I thought this was
> strange because I delete all lists that are not longer necessary
> directly and my workstation theoretically has more than enough memory to
> run the script.

Keep in mind that Python floats are rich objects, not C floats, and so 
take up more space: 16 bytes on a 32 bit system compared to typically 8 
bytes for a C float. (Both of these may vary on other hardware or 
operating systems.)

Also keep in mind that your Python process may not have access to all 
your machine's memory -- some OSes default to relatively small per-
process memory limits. If you are using a Unix or Linux, you may need to 
look at ulimit.



> so I did some investigation on the memory use of the script. I found out
> that when i populated the lists with floats using a for ... in range()
> loop a lot of overhead memory is used and that this memory is not freed
> after populating the list and is also not freed after deleting the list.

I would be very, very, very surprised if the memory truly wasn't freed 
after deleting the lists. A memory leak of that magnitude is unlikely to 
have remained undetected until now. More likely you're either 
misdiagnosing the problem, or you have some sort of reference cycle.



 
> This way the memory keeps filling up after each newly populated list
> until the script crashes.

Can you post us the smallest extract of your script that crashes?



> I did a couple of tests and found that populating lists with range or
> xrange is responsible for the memory overhead. 

I doubt it. Even using range with 8 million floats only wastes 35 MB or 
so. That's wasteful, but not excessively so.



> Does anybody know why
> this happens and if there's a way to avoid this memory problem?
> 
> First the line(s) python code I executed. Then the memory usage of the
> process: Mem usage after creation/populating of big_list
> sys.getsizeof(big_list)
> Mem usage after deletion of big_list
> 
> big_list = [0.0] * 2700*3250
> 40
> 35
> 6


You don't specify what those three numbers are (the middle one is 
getsizeof the list, but the other two are unknown. How do you calculate 
memory usage? I don't believe that your memory usage is 6 bytes! Nor do I 
believe that getsizeof(big_list) returns 35 bytes!

On my system:

>>> x = [0.0] * 2700*3250
>>> sys.getsizeof(x)
35100032


 
> big_list = [0.0 for i in xrange(2700*3250)]
> 40
> 36
> 6

This produces a lightweight xrange object, then wastefully iterates over 
it to produce a list made up of eight million instances of the float 0.0. 
The xrange object is then garbage collected automatically.


> big_list = [0.0 for i in range(2700*3250)]
> 145
> 36
> 110

This produces a list containing the integers 0 through 8+ million, then 
wastefully iterates over it to produce a second list made up of eight 
million instances of the float 0.0, before garbage collecting the first 
list. So at its peak, you require 35100032 bytes for a pointless 
intermediate list, doubling the memory capacity needed to generate the 
list you actually want.



> big_list = [float(i) for i in xrange(2700*3250)] 
> 180
> 36
> 145

Again, the technique you are using does a pointless amount of extra work. 
The values in the xrange object are already floats, calling float on them 
just wastes time. And again, the memory usage you claim is utterly 
implausible.


To really solve this problem, we need to see actual code that raises 
MemoryError. Otherwise we're just wasting time.



-- 
Steven



More information about the Python-list mailing list