Tree structure consuming lot of memory

Chris Rebert clp2 at rebertia.com
Mon Jul 6 17:03:48 EDT 2009


> On Tue, Jul 7, 2009 at 1:28 AM, Antoine Pitrou <solipsis at pitrou.net> wrote:
>>
>> mayank gupta <mooniitk <at> gmail.com> writes:
>> >
>> > After a little analysis, I found out that in general it uses about
>> > 1.4 kb of memory for each node!!
>>
>> How did you measure memory use? Python objects are not very compact, but
>> 1.4KB
>> per object seems a bit too much (I would expect more about 150-200
>> bytes/object
>> in 32-bit mode, or 300-400 bytes/object in 64-bit mode).
>>
>> One of the solutions is to use __slots__ as already suggested. Another,
>> which
>> will have similar benefits, is to use a namedtuple. Both suppress the
>> instance
>> dictionnary (`instance`.__dict__), which is a major contributor to memory
>> consumption. Illustration (64-bit mode, by the way):
>>
>> >>> import sys
>> >>> from collections import namedtuple
>>
>> # First a normal class
>> >>> class Node(object): pass
>> ...
>> >>> o = Node()
>> >>> o.value = 1
>> >>> o.children = ()
>> >>>
>> >>> sys.getsizeof(o)
>> 64
>> >>> sys.getsizeof(o.__dict__)
>> 280
>> # The object seems to take a mere 64 bytes, but the attribute dictionnary
>> # adds a whoppy 280 bytes and bumps actual size to 344 bytes!
>>
>> # Now a namedtuple (a tuple subclass with property accessors for the
>> various
>> # tuple items)
>> >>> Node = namedtuple("Node", "value children")
>> >>>
>> >>> o = Node(value=1, children=())
>> >>> sys.getsizeof(o)
>> 72
>> >>> sys.getsizeof(o.__dict__)
>> Traceback (most recent call last):
>>  File "<stdin>", line 1, in <module>
>> AttributeError: 'Node' object has no attribute '__dict__'
>>
>> # The object doesn't have a __dict__, so 72 bytes is its real total size.
On Mon, Jul 6, 2009 at 1:30 PM, mayank gupta<mooniitk at gmail.com> wrote:
> I worked out a small code which initializes about 1,000,000 nodes with some
> attributes, and saw the memory usage on my linux machine (using 'top'
> command). Then just later I averaged out the memory usage per node. I know
> this is not the most accurate way but just for estimated value.

You should try the more accurate sys.getsizeof() function:
http://docs.python.org/library/sys.html#sys.getsizeof

Cheers,
Chris

P.S. Please don't top-post in the future.
-- 
http://blog.rebertia.com



More information about the Python-list mailing list