Memory usage per top 10x usage per heapy

Junkshops junkshops at gmail.com
Tue Sep 25 16:26:07 EDT 2012


On 9/25/2012 11:17 AM, Oscar Benjamin wrote:
> On 25 September 2012 19:08, Junkshops <junkshops at gmail.com 
> <mailto:junkshops at gmail.com>> wrote:
>
>
>     In [38]: mpef._ustore._store
>     Out[38]: defaultdict(<type 'dict'>, {'Measurement':
>     {'8991c2dc67a49b909918477ee4efd767':
>     <micropheno.exchangeformat.Exceptions.FileContext object at
>     0x2f0fe90>, '7b38b429230f00fe4731e60419e92346':
>     <micropheno.exchangeformat.Exceptions.FileContext object at
>     0x2f0fad0>, 'b53531471b261c44d52f651add647544':
>     <micropheno.exchangeformat.Exceptions.FileContext object at
>     0x2f0f4d0>, '44ea6d949f7c8c8ac3bb4c0bf4943f82':
>     <micropheno.exchangeformat.Exceptions.FileContext object at
>     0x2f0f910>, '0de96f928dc471b297f8a305e71ae3e1':
>     <micropheno.exchangeformat.Exceptions.FileContext object at
>     0x2f0f550>}})
>
>
> Have these exceptions been raised from somewhere before being stored? 
> I wonder if you're inadvertently keeping execution frames alive. There 
> are some problems in CPython with this that are related to storing 
> exceptions.
FileContext objects aren't exceptions. They store information about 
where the stored object originally came from, so if there's an MD5 or ID 
clash with a later line in the file the code can report both the current 
line and the older clashing line to the user. I have an Exception 
subclass that takes a FileContext as an argument. There are no 
exceptions thrown in the file I processed to get the heapy results 
earlier in the thread.

>> In [43]: mpef._ustore._idstore['Measurement']._SIDstore
>> Out[43]: defaultdict(<function <lambda> at 0x2ece7d0>, 
>> {'emailRemoved': defaultdict(<function <lambda> at 0x2c4caa0>, 
>> {'microPhenoShew2011': defaultdict(<type 'dict'>, {0: 
>> {'MLR_124572462': '8991c2dc67a49b909918477ee4efd767', 
>> 'MLR_124572161': '7b38b429230f00fe4731e60419e92346', 
>> 'SMMLR_12551352': 'b53531471b261c44d52f651add647544', 
>> 'SMMLR_12551051': '0de96f928dc471b297f8a305e71ae3e1', 
>> 'SMMLR_12550750': '44ea6d949f7c8c8ac3bb4c0bf4943f82'}})})})
> Also I think lambda functions might be able to keep the frame alive. 
> Are they by any chance being created in a function that is called in a 
> loop?
>
Here's the context for the lambdas:

   def __init__(self):
     self._SIDstore = defaultdict(lambda: defaultdict(lambda: 
defaultdict(dict)))

So the lambda is only being called when a new key is added to the top 3 
levels of the datastructure, which in the test case I've been 
discussing, only happens once each.

Although the suggestion to change the hex strings to ints is a good one 
and I'll do it, what I'm really trying to understand is why there's such 
a large difference between the memory use per top (and the fact that the 
code appears to thrash swap) and per heapy and my calculations of how 
much memory the code should be using.

Cheers, MrsEntity
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-list/attachments/20120925/66d34361/attachment.html>


More information about the Python-list mailing list