some comments for Python 3000

Rainer Deyke root at rainerdeyke.com
Mon Aug 14 16:35:46 EDT 2000


"Bernhard Herzog" <herzog at online.de> wrote in message
news:m34s4nzofg.fsf at greebo.nodomain.de...
> "Rainer Deyke" <root at rainerdeyke.com> writes:
>
> > Much worse, in fact.  In some cases, C outperforms Python by over 100:1.
> > And memory usage is even worse.  In C, I can have a integer variable
with
> > range 0 to 255 in a single byte.  In Python, I need at least two objects
> > (the integer and the string that holds the integer's name), both
allocated
> > on the stack (with whatever overhead this entails),
>
> I think you meant heap, :-)

Indeed I do.

> > both with four bytes reference count and four bytes pointer to the
> > type object, plus the contents which are again at least four bytes
> > each, plus one byte for each character in the variable name
>
> Ok, that's 12 bytes for the int and at least 16 for the string because
> in addition to the refcount and type it also has a cached 32bit hash
> value and a pointer to the interned version of the string object.
> Caching the hash value and the interned string can be switched off, but
> let's assume that the defaults are used.
>
> > - and that isn't counting the extra storage needed for the entry in
> > the dictionary (another eight bytes on average at least). That's worse
> > than 64:1. Even if the C version uses four bytes for the integer, it's
> > 16:1.
>
> Now, where does the 64 come from? Assuming that the string fits into 32
> bytes (which means it has at most 16 characters including the trailing
> 0) and the dict entry, I get 52 bytes. Ok, counting in a bit malloc
> overhead and allowing for even longer variable names we get about 64
> bytes. This estimation assumes that we're talking about global variables
> or instance/class variables; local variables aren't usually stored in a
> dict.

My mistake.  Appearantly I multiplied by two once too often.

> However, the strings used for variable names and other identifiers are
> interned so if you use the same variable names in several places the
> same string objects are used. Plus, for the ints -1 to 99 you always get
> the same objects. This kind of objects sharing can reduce the memory
> requirements drastically and it makes it very hard to estimate just how
> much memory will be needed to hold a certain data structure.

This is very convenient.  I retract my statement: speed is a much bigger
problem than memory usage.

Most memory is probably not consumed by stand-alone variables, but by large
collections (tuples/lists/dictionaries) and class instances.  For lists and
tuples, individual contents are not named so memory usage is reduced by 50%
or more.  For class instances, they can probably share the same strings for
names, again drastically reducing memory usage.  Still, a Python program is
likely to use several times as much memory as an equivalent C/C++ program.
Each dictionary entry takes at least 8 bytes.


--
Rainer Deyke (root at rainerdeyke.com)
Shareware computer games           -           http://rainerdeyke.com
"In ihren Reihen zu stehen heisst unter Feinden zu kaempfen" - Abigor





More information about the Python-list mailing list