Memory usage per top 10x usage per heapy

Junkshops junkshops at gmail.com
Tue Sep 25 00:21:05 EDT 2012


> Just curious;  which is it, two million lines, or half a million bytes?
I have, in fact, this very afternoon, invented a means of writing a 
carriage return character using only 2 bits of information. I am 
prepared to sell licenses to this revolutionary technology for the low 
price of $29.95 plus tax.

Sorry, that should've been a 500Mb, 2M line file.

> which machine is 2gb, the Windows machine, or the VM?
VM. Winders is 4gb.

> ...but I would point out that just because
> you free up the memory from the Python doesn't mean it gets released
> back to the system.  The C runtime manages its own heap, and is pretty
> persistent about hanging onto memory once obtained.  It's not normally a
> problem, since most small blocks are reused.  But it can get
> fragmented.  And i have no idea how well Virtual Box maps the Linux
> memory map into the Windows one.
Right, I understand that - but what's confusing me is that, given the 
memory use is (I assume) monotonically increasing, the code should never 
use more than what's reported by heapy once all the data is loaded into 
memory, given that memory released by the code to the Python runtime is 
reused. To the best of my ability to tell I'm not storing anything I 
shouldn't, so the only thing I can think of is that all the object 
creation and destruction, for some reason, it preventing reuse of 
memory. I'm at a bit of a loss regarding what to try next.

Cheers, MrsE

On 9/24/2012 6:14 PM, Dave Angel wrote:
> On 09/24/2012 05:59 PM, MrsEntity wrote:
>> Hi all,
>>
>> I'm working on some code that parses a 500kb, 2M line file
> Just curious;  which is it, two million lines, or half a million bytes?
>
>> line by line and saves, per line, some derived strings into various data structures. I thus expect that memory use should monotonically increase. Currently, the program is taking up so much memory - even on 1/2 sized files - that on 2GB machine
> which machine is 2gb, the Windows machine, or the VM?  You could get
> thrashing at either level.
>
>> I'm thrashing swap. What's strange is that heapy (http://guppy-pe.sourceforge.net/) is showing that the code uses about 10x less memory than reported by top, and the heapy data seems consistent with what I was expecting based on the objects the code stores. I tried using memory_profiler (http://pypi.python.org/pypi/memory_profiler) but it didn't really provide any illuminating information. The code does create and discard a number of objects per line of the file, but they should not be stored anywhere, and heapy seems to confirm that. So, my questions are:
>>
>> 1) For those of you kind enough to help me figure out what's going on, what additional data would you like? I didn't want swamp everyone with the code and heapy/memory_profiler output but I can do so if it's valuable.
>> 2) How can I diagnose (and hopefully fix) what's causing the massive memory usage when it appears, from heapy, that the code is performing reasonably?
>>
>> Specs: Ubuntu 12.04 in Virtualbox on Win7/64, Python 2.7/64
>>
>> Thanks very much.
> Tim raised most of my concerns, but I would point out that just because
> you free up the memory from the Python doesn't mean it gets released
> back to the system.  The C runtime manages its own heap, and is pretty
> persistent about hanging onto memory once obtained.  It's not normally a
> problem, since most small blocks are reused.  But it can get
> fragmented.  And i have no idea how well Virtual Box maps the Linux
> memory map into the Windows one.
>
>
>



More information about the Python-list mailing list