python reading file memory cost

张彤 tzhang at sinap.ac.cn
Tue Aug 2 07:00:39 EDT 2011


Thanks Peter! Your explanation is great!
And one more question:
Why it is still keeping the memory even when I del the large array in
interactive python mode?

-----Original Message-----
From: Peter Otten [mailto:__peter__ at web.de] 
Sent: Tuesday, August 02, 2011 4:26 PM
To: python-list at python.org
Subject: Re: python reading file memory cost

Chris Rebert wrote:

>> The running result was that read a 500M file consume almost 2GB RAM, 
>> I cannot figure it out, somebody help!
> 
> If you could store the floats themselves, rather than their string 
> representations, that would be more space-efficient. You could then 
> also use the `array` module, which is more space-efficient than lists 
> (http://docs.python.org/library/array.html ). Numpy would also be 
> worth investigating since multidimensional arrays are involved.
> 
> The next obvious question would then be: do you /really/ need /all/ of 
> the data in memory at once?

This is what you (OP) should think about really hard before resorting to the
optimizations mentioned above. Perhaps you can explain what you are doing
with the data once you've loaded it into memory?

> Also, just so you're aware:
> http://docs.python.org/library/sys.html#sys.getsizeof

To give you an idea how memory usage explodes:

>>> line = "1.23 4.56 7.89 0.12\n"
>>> len(line) # size in the file
20
>>> sys.getsizeof(line)
60
>>> formatted = ["%2.6E" % float(x) for x in line.split()]
>>> sys.getsizeof(formatted) + sum(sys.getsizeof(s) for s in formatted)
312







More information about the Python-list mailing list