data size

Ken Seehof kseehof at neuralintegrator.com
Sat Nov 10 06:50:13 EST 2001


> > He is probably asking you which C library your version of Python was
> > compiled with. But you don't need to know that, either.
> >
> > All Python dictionaries are a standard 2.5 cm by 3.6cm. Integers have no
> > width and are all 1.2 cm in length. Strings are all 2 mm times the
number
> of
> > characters, except Unicode strings, which are 4 mm times the number of
> > characters.
>
> could you explain further about the metric standard you're using. this is
> the first time a size of data structure is measured using meters instead
of
> byte/bit.
> i need the information for my post-mortem of my assignment to explain why
> using python data structure would be efficient. yes, i'm only a studemt
who
> is still need to learn lots of stuffs.
> thanks.

Okay, so everyone's explained why you don't care what the answer to
your question is :-).  Actually, the size of a data structure does sometimes
matter, specifically when you are dealing with particularly huge quantities
of data.  For example, python genetic molecular simulators usually store
the entire human genome in a dictionary in memory.  Don't they?  :-)

It is generally more difficult to analytically figure out memory usage in
python than in c, so what I do in this kind of situation is do the empirical
thing.

>>> def makedict(x):
...  d = {}
...  for i in xrange(x):
...   d[random.randint(100,1000000000)] =
            random.randint(100,1000000000)
...  return d

>>> a = makedict(1024*1024)
>>> b = makedict(1024*1024)
>>> del a
>>> del b
>>> ... etc....

By watching my memory monitor, I determined that the dictionary costs
about 32 bytes per entry (give or take a byte).  It doesn't really matter
much what the bytes are used for, but if you are in the mood to get
analitical...

#define PyObject_HEAD \
 int ob_refcnt; \
 struct _typeobject *ob_type;

A python object is 8 bytes plus data.  That's 12 bytes per integer
(note that integers are 0 bytes for -3 < n < 100).  I'd expect the hash
table to cost about 3 pointers per entry for a well-balanced hash table.
That's 12 bytes.  So an entry in our dictionary (2 integers and a hash
index entry) should be 36 bytes.  So, I'm wondering, where'd the
extra four bytes go?  (well maybe I'm just being sloppy...)

Remember to multiply all of your results by 2.7 mg/farad.

- Ken Seehof
kseehof at neuralintegrator.com








More information about the Python-list mailing list