Python does not take up available physical memory

Pradipto Banerjee pradipto.banerjee at adainvestments.com
Sun Oct 21 10:14:59 EDT 2012


I tried this on a different PC with 12 GB RAM. As expected, this time, reading the data was no issue. I noticed that for large files, Python takes up 2.5x size in memory compared to size on disk, for the case when each line in the file is retained as a string within a Python list. As an anecdote, for MATLAB, the similar overhead is 2x, slightly lower than Python, and each line in the file was retained as string within a MATLAB cell. I'm curious, has any one compared the overhead of data in memory for other languages like for instance Ruby?


-----Original Message-----
From: Python-list [mailto:python-list-bounces+pradipto.banerjee=adainvestments.com at python.org] On Behalf Of Steven D'Aprano
Sent: Friday, October 19, 2012 6:12 PM
To: python-list at python.org
Subject: Re: Python does not take up available physical memory

On Fri, 19 Oct 2012 14:03:37 -0500, Pradipto Banerjee wrote:

> Thanks, I tried that. Still got MemoryError, but at least this time
> python tried to use the physical memory. What I noticed is that before
> it gave me the error it used up to 1.5GB (of the 2.23 GB originally
> showed as available) - so in general, python takes up more memory than
> the size of the file itself.

Well of course it does. Once you read the data into memory, it has its
own overhead for the object structure.

You haven't told us what the file is or how you are reading it. I'm going
to assume it is ASCII text and you are using Python 2.

py> open("test file", "w").write("abcde")
py> os.stat("test file").st_size
5L
py> text = open("test file", "r").read()
py> len(text)
5
py> sys.getsizeof(text)
26

So that confirms that a five byte ASCII string takes up five bytes on
disk but 26 bytes in memory as an object.

That overhead will depend on what sort of object, whether Unicode or not,
the version of Python, and how you read the data.

In general, if you have a huge amount of data to work with, you should
try to work with it one line at a time:

for line in open("some file"):
    process(line)


rather than reading the whole file into memory at once:

lines = open("some file").readlines()
for line in lines:
    process(line)



--
Steven
--
http://mail.python.org/mailman/listinfo/python-list

 This communication is for informational purposes only. It is not intended to be, nor should it be construed or used as, financial, legal, tax or investment advice or an offer to sell, or a solicitation of any offer to buy, an interest in any fund advised by Ada Investment Management LP, the Investment advisor.  Any offer or solicitation of an investment in any of the Funds may be made only by delivery of such Funds confidential offering materials to authorized prospective investors.  An investment in any of the Funds is not suitable for all investors.  No representation is made that the Funds will or are likely to achieve their objectives, or that any investor will or is likely to achieve results comparable to those shown, or will make any profit at all or will be able to avoid incurring substantial losses.  Performance results are net of applicable fees, are unaudited and reflect reinvestment of income and profits.  Past performance is no guarantee of future results. All financial data and other information are not warranted as to completeness or accuracy and are subject to change without notice.

Any comments or statements made herein do not necessarily reflect those of Ada Investment Management LP and its affiliates. This transmission may contain information that is confidential, legally privileged, and/or exempt from disclosure under applicable law. If you are not the intended recipient, you are hereby notified that any disclosure, copying, distribution, or use of the information contained herein (including any reliance thereon) is strictly prohibited. If you received this transmission in error, please immediately contact the sender and destroy the material in its entirety, whether in electronic or hard copy format.



More information about the Python-list mailing list