Memory usage per top 10x usage per heapy

Dave Angel d at davea.name
Tue Sep 25 07:06:29 EDT 2012


On 09/25/2012 12:21 AM, Junkshops wrote:
>> Just curious;  which is it, two million lines, or half a million bytes?
<snip>
> 
> Sorry, that should've been a 500Mb, 2M line file.
> 
>> which machine is 2gb, the Windows machine, or the VM?
> VM. Winders is 4gb.
> 
>> ...but I would point out that just because
>> you free up the memory from the Python doesn't mean it gets released
>> back to the system.  The C runtime manages its own heap, and is pretty
>> persistent about hanging onto memory once obtained.  It's not normally a
>> problem, since most small blocks are reused.  But it can get
>> fragmented.  And i have no idea how well Virtual Box maps the Linux
>> memory map into the Windows one.
> Right, I understand that - but what's confusing me is that, given the
> memory use is (I assume) monotonically increasing, the code should never
> use more than what's reported by heapy once all the data is loaded into
> memory, given that memory released by the code to the Python runtime is
> reused. To the best of my ability to tell I'm not storing anything I
> shouldn't, so the only thing I can think of is that all the object
> creation and destruction, for some reason, it preventing reuse of
> memory. I'm at a bit of a loss regarding what to try next.

I'm not familiar with heapy, but perhaps it's missing something there.
I'm a bit surprised you aren't beyond the 2gb limit, just with the
structures you describe for the file.  You do realize that each object
has quite a few bytes of overhead, so it's not surprising to use several
times the size of a file, to store the file in an organized way.  I also
wonder if heapy has been written to take into account the larger size of
pointers in a 64bit build.

Perhaps one way to save space would be to use a long to store those md5
values.  You'd have to measure it, but I suspect it'd help (at the cost
of lots of extra hexlify-type calls).  Another thing is to make sure
that the md5 object used in your two maps is the same object, and not
just one with the same value.


-- 

DaveA



More information about the Python-list mailing list