Jeremy Hylton : weblog : 2003-04-23

How Big Are Persistent Objects?

Wednesday, April 23, 2003

Python objects use a lot of memory. I think people are often surprised at just how much memory they use. Persistent objects in Zope use even more memory than regular objects. One reason is that the objects store extra data in the C struct for the object. Another reason is that each database connection loads independent copies of all the objects.

It's important to minimize the amount of memory that persistent objects use, because we want to provide minimum extra overhead for using ZODB in Python programs.

It may be more important to minimize the memory for ghost objects (surrogates). A ghost object is loaded in memory because some other object has referred to it, but it is a ghost because it hasn't been access recently. The only reason a ghost is in memory is to preserve pointer equality, so it seems wasteful if a ghost uses a lot of memory.

The memory usage for a persistent object stacks up like this:

Bytes Usage
12 Python GC header
8 PyObject_HEAD
20 PyPersist_HEAD (data mgr, oid, serial, atime, state)
8 Pointer to instance dict
8 Pointer to weakref list
144 instance dict (holding up to 5 attrs)
28 PyStringObject for 8-byte oid
28 PyStringObject for 8-byte serial number
256 Total

The C code from persistence.h is:

#define PyPersist_HEAD 
    PyObject *po_dm; 
    PyObject *po_oid; 
    PyObject *po_serial; 
    int po_atime; 
    enum PyPersist_State po_state;

typedef struct { PyPersist_HEAD } PyPersistObject;

There is other overhead, too. obmalloc will round up to an 8-byte offset for each malloc'd chunk. So the two strings actually use 32 bytes. And the GC header gets aligned with a long double, so on some platforms it will actually be 16 bytes. In the worst case, that's 12 more bytes.

A ghost is smaller because the instance dict is freed, but it still takes 96 bytes.

There are a few things we can do to make objects take less space. We'll probably work on that soon. First, we can use Python longs for oids and serial numbers instead of strings. In general, a 64-bit long takes as much space as an 8-byte string. OIDs are usually densely allocated, so they tend to be much smaller than 64 bits. A 30-bit OID will fit in a long that takes only 16 bytes. Using longs instead of strings will save 24 bytes (32 including obmalloc overhead).

Another space savings is to take the serial number out of the C struct for the object. In general, it's good to keep data out of the C struct, because a ghost still uses whatever memory was initially allocated for it. A ghost does not have a serial number. (A serial number is associated with the revision of the object that was loaded from the database when the object was unghosted.) If the serial number is stored in the instance dict instead, we will save 4 bytes in the object header. It will make using slots more complicated, but I think that's okay.

A third savings is to eliminate or shrink the po_atime slot that is currently used to provide LRU cache eviction. The po_atime and po_state slots are both ints, so they consume a total of 8 bytes. We should be able to fit the state in a single byte, and perhaps eliminate po_atime. It's hard to know how to replace the po_atime. ZODB3 uses a doubly-linked list in the C struct, but that adds 8 bytes of overhead to all objects. One possibility is to use Corbato's CLOCK algorithm, which is what Thor's adapative cache uses. (The details are different in Thor, because it is page-based, while ZODB is object-based.)

It is possible to use slots to eliminate the need for the instance dict, as well as the instance dict pointer and the weaklist pointer. On the other hand, if you use slots, you need to explicitly include _p_serial in the list. Perhaps there should be a base class for persistent classes with slots that does this automatically.

April 2003
Sun Mon Tue Wed Thu Fri Sat
1 2 3 4 5
6 7 8 9 10 11 12
13 14 15 16 17 18 19
20 21 22 23 24 25 26
27 28 29 30
Mar  May


Copyright © 2003 by Jeremy Hylton <> source