Memory usage per top 10x usage per heapy

Tue Sep 25 17:17:52 EDT 2012

On 25 September 2012 21:26, Junkshops <junkshops at gmail.com> wrote:

>  On 9/25/2012 11:17 AM, Oscar Benjamin wrote:
>
> On 25 September 2012 19:08, Junkshops <junkshops at gmail.com> wrote:
>
>>
>> In [38]: mpef._ustore._store
>> Out[38]: defaultdict(<type 'dict'>, {'Measurement':
>> {'8991c2dc67a49b909918477ee4efd767':
>> <micropheno.exchangeformat.Exceptions.FileContext object at 0x2f0fe90>,
>> '7b38b429230f00fe4731e60419e92346':
>> <micropheno.exchangeformat.Exceptions.FileContext object at 0x2f0fad0>,
>> 'b53531471b261c44d52f651add647544':
>> <micropheno.exchangeformat.Exceptions.FileContext object at 0x2f0f4d0>,
>> '44ea6d949f7c8c8ac3bb4c0bf4943f82':
>> <micropheno.exchangeformat.Exceptions.FileContext object at 0x2f0f910>,
>> '0de96f928dc471b297f8a305e71ae3e1':
>> <micropheno.exchangeformat.Exceptions.FileContext object at 0x2f0f550>}})
>>
>
>  Have these exceptions been raised from somewhere before being stored? I
> wonder if you're inadvertently keeping execution frames alive. There are
> some problems in CPython with this that are related to storing exceptions.
>
> FileContext objects aren't exceptions. They store information about where
> the stored object originally came from, so if there's an MD5 or ID clash
> with a later line in the file the code can report both the current line and
> the older clashing line to the user. I have an Exception subclass that
> takes a FileContext as an argument. There are no exceptions thrown in the
> file I processed to get the heapy results earlier in the thread.
>

I don't know whether it would be better or worse but it might be worth
seeing what happens if you replace the FileContext objects with tuples.

>
>
>  In [43]: mpef._ustore._idstore['Measurement']._SIDstore
> Out[43]: defaultdict(<function <lambda> at 0x2ece7d0>, {'emailRemoved':
> defaultdict(<function <lambda> at 0x2c4caa0>, {'microPhenoShew2011':
> defaultdict(<type 'dict'>, {0: {'MLR_124572462':
> '8991c2dc67a49b909918477ee4efd767', 'MLR_124572161':
> '7b38b429230f00fe4731e60419e92346', 'SMMLR_12551352':
> 'b53531471b261c44d52f651add647544', 'SMMLR_12551051':
> '0de96f928dc471b297f8a305e71ae3e1', 'SMMLR_12550750':
> '44ea6d949f7c8c8ac3bb4c0bf4943f82'}})})})
>
> Also I think lambda functions might be able to keep the frame alive. Are
> they by any chance being created in a function that is called in a loop?
>
>   Here's the context for the lambdas:
>
>   def __init__(self):
>     self._SIDstore = defaultdict(lambda: defaultdict(lambda:
> defaultdict(dict)))
>
> So the lambda is only being called when a new key is added to the top 3
> levels of the datastructure, which in the test case I've been discussing,
> only happens once each.
>

I can't see anything wrong with that but then I'm not sure if the lambda
function always keeps its frame alive. If there's only that one line in the
__init__ function then I'd expect it to be fine.

>
> Although the suggestion to change the hex strings to ints is a good one
> and I'll do it, what I'm really trying to understand is why there's such a
> large difference between the memory use per top (and the fact that the code
> appears to thrash swap) and per heapy and my calculations of how much
> memory the code should be using.
>

Perhaps you could see what objgraph comes up with:
http://pypi.python.org/pypi/objgraph

So far as I know objgraph doesn't tell you how big objects are but it does
give a nice graphical representation of which objects are alive and which
other objects they are referenced by. You might find that some other object
is kept alive that you didn't expect.

Oscar
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-list/attachments/20120925/8ea756f2/attachment.html>