python reading file memory cost

Peter Otten __peter__ at web.de
Tue Aug 2 04:26:22 EDT 2011


Chris Rebert wrote:

>> The running result was that read a 500M file consume almost 2GB RAM, I
>> cannot figure it out, somebody help!
> 
> If you could store the floats themselves, rather than their string
> representations, that would be more space-efficient. You could then
> also use the `array` module, which is more space-efficient than lists
> (http://docs.python.org/library/array.html ). Numpy would also be
> worth investigating since multidimensional arrays are involved.
> 
> The next obvious question would then be: do you /really/ need /all/ of
> the data in memory at once?

This is what you (OP) should think about really hard before resorting to the 
optimizations mentioned above. Perhaps you can explain what you are doing 
with the data once you've loaded it into memory?

> Also, just so you're aware:
> http://docs.python.org/library/sys.html#sys.getsizeof

To give you an idea how memory usage explodes:

>>> line = "1.23 4.56 7.89 0.12\n"
>>> len(line) # size in the file
20
>>> sys.getsizeof(line)
60
>>> formatted = ["%2.6E" % float(x) for x in line.split()]
>>> sys.getsizeof(formatted) + sum(sys.getsizeof(s) for s in formatted)
312





More information about the Python-list mailing list