Jeremy Hylton : weblog : April 2003 last modified Thu Mar 17 01:11:16 2005

Jeremy Hylton's Web Log, April 2003

Debugging memory leaks

permanent link
Thursday, April 10, 2003

Tim and I have spent much of the last week debugging memory leaks in Zope. All the leaks were found by running unit tests or running tiny demo scripts in a loop. A few weeks ago, I also wrote a trivial benchmark for ZODB4. I just wanted to measure how quickly I could create persistent mail message objects and add them to a b-tree index. One conclusion is that functional testing of this sort is really important. None of the stress tests were hard to set up, but they found a large number of leaks and performance problems. The code base is looking much better now.

The chase has lead us all over Zope and Python. The Python garbage collector needed some significant changes in the way it looked for objects with finalizers.

If the collector finds a cycle of objects that contains a finalizer, it does not collect the objects in the cycle. If there are multiple finalizers, there is no way for the language to know what order to invoke them in. It seems possible that each finalizer depends on the state of the other object, such that each must be run first. Guido made on helpful observation, though: If a cycle has one finalizer, it's always safe to run it.

The problem was the Python was using C code that was roughly equivalent to hasattr(obj, "__del__"). The danger is that the hasattr could execute an arbitrary amount of Python code, because of a getattr hook or a custom descriptor. It's not safe to run any Python code during garbage collection, because it could deallocate objects or make previously unreachable objects reachable again. We actually ran into both cases in the ZODB code that was failing.

The fix for the garbage collector ended up being quite satisfying, although it took quite a while to diagnose the problem and understand the right way to fix it. The key idea is that a finalizer (__del__ method) must always be defined by an object's class. An __del__ attribute on an object is not treated as a finalizer. Since the class defines the finalizer, we can look in the dictionaries of base classes for the method without executing any Python code. One corner case of interest is that if you find a descriptor for __del__, then you assume the object has a finalizer without actually calling the descriptor. (In most cases, if you find a descriptor, you call it right away.)

There are a bunch of handy techniques for debugging memory leaks that ought to be collected somewhere. Perhaps the most important idea is that if you run an isolated code fragment in a loop and run the garbage collector each time around the loop, the total number of reference counts should stay the same. (An isolated fragment is one that doesn't make objects reachable from some external object, like sys.modules.) In practice, it will take a few iterations for the code to settle down. The first time it will probably import some module or do other initialization.

A debug build of Python provides valuable tools for deciding if the code fragment is leaking. The function sys.gettotalrefcount() returns the sum of all object reference counts (only in a debug build). If this number goes up, then you are leaking references at the very least. The function sys.getobjects() returns a list of all the objects in the interpreter, with the most recently allocated objects first. If the number of objects increases each time around the loop, you are leaking objects. This list provides a good way to analyze the leak. Tim wrote a routine that groups objects by type and reports how many objects of each type were created since the last call to the routine.

If an object leaks, it's a bug in C code somewhere. Either a C extension or some part of the Python core is missing a Py_DECREF() call. Knowing the type of the object may be enough to tell you where to look for the ref count problem.

In practice, a missing decref usually causes a group of objects to be leaked. It can be hard to figure out which object is responsible for the leak. One technique that proved helpful was to print the reference count and referrers for each object that was leaking.

for o in leaky_objects:
    print hex(id(o)), type(o), sys.getrefcount(o), len(gc.get_referrers(o))

Normally, the number of references will be one greater then the number of referrers. (There is an extra, temporary reference created when getrefcount() is called.) If the difference between refcount and number of referrers is more than one, there is an external reference to the object. That is, a reference from an object the garbage collector can't find. Some code forgot to decref this object.

Metaclass for struct-like classes

permanent link
Monday, April 14, 2003

I just noticed a reference to Alex Martelli's MetaBunch class. It looks like a useful helper, and would make a good addition to Demo/newmetaclasses/.

The point is: if you use a lot of "struct-like" classes, metaMetaBunch offers you a nice and convenient way to define them, with clean syntax and reasonable default semantics (__init__ and __repr__).

Packing a storage

permanent link
Wednesday, April 16, 2003

Rys McCusker asked why every storage implementation needs to provides its own garbage collection (via the pack() method). The garbage collector is hard to implement. So it would be better to implement it once and allow all the storages to share that implementation. I didn't have a good answer when he asked, but I've spent about a week struggling with pack() and have one now.

The garbage collector implements the same policy for each storage: A pack, given a specified pack time, allows the storage to reclaim storage for revisions that are only needed to support undo of transactions before the pack time. That is, if an object revision is neither the current revision at the pack time nor referenced as a result of an undo or version operation after the pack time. An efficient implementation of this policy depends on all sorts of details of the storage implementation.

I have been working on a new pack implementation for FileStorage for the last several days. The current implementation is very complicated and seems impossible to maintain. As a result, I've resisted any substantial refactoring. But we've found several bugs recently and small patches to fix those bugs just increase the complexity. No choice but to start from scratch and try to do the simplest thing.

The pack implementation that Barry came up with for Berkeley storages is fairly simple, but can't be used for FileStorage at all. The chief difference is that Berkeley has data structures that can be updated in place. The storage can add and remove elements from a Berkeley table as needed. FileStorage is an append-only log, so it isn't practical to maintain auxiliary data structures. BerkeleyDB also allows these data structures to be moved between memory and disk transparently. In FileStorage, data structures like the index are always in memory.

Berkeley storage uses reference counting to do most of the work. The data and references for an object revision are stored in a separate table. Every transaction that uses the same data has a reference to the entry in the pickle table. (In FileStorage, there is a backpointer to the first transaction that wrote the pickle.) When a transaction is removed by pack, the reference count is decremented. (The Berkeley pack uses a mark-and-sweep phase to handle unreachable objects.)

The new FileStorage pack implementation starts by determining what data records written before the pack time are reachable. A FileStorage is pack by copying all the reachable records from the original file into a new file. Reachability is harder to determine that you might expect, because the undo and versions features create references to object revisions that otherwise unreachable. Tim helped me develop a straightforward reachability pass. It still needs revision to handle versions and to be space efficient.

Client-server access to ZODB

permanent link
Friday, April 18, 2003

One component of ZODB is ZEO, a client-server storage layer that allows many clients to share a single database. Client-server support is a critical feature. Even if you only expect a single client during normal operation, it is useful to connect a second client for debugging.

The current ZEO system (2.0) works well enough, but it is fairly complex and difficult to extend. If I had free reign to start over, there are a bunch of interesting possibilities to consider.

RPC protocol

The current ZEO implementation uses version two of a homegrown RPC protocol. One serious problem with the current scheme is that it uses cPickle to serialize objects before sending them over the wire. cPickle is very efficient for serializing small objects. Since all the logic of serialization is in C, it's cheap to construct the pickle. But pickle does badly for large string objects -- like the objects returned from load(). The pickler creates a copy of the string. So to return a very large object from ZEO, we need to load the object into memory, then copy it, then send it out over asyncore. Sending it via asyncore will almost certainly break it up into little pieces. The client has to do the same thing on unpickling. This can lead to excess memory usage or outright failures (MemoryErrors) when fragmentation makes it impossible to allocate a big enough buffer.

Most of the ZEO protocol involves asynchronous methods. The store() method, for example, is asynchronous. This ends up complicating the storage API, because the storage must return a new serial number to the client. The asynchrony is important, because it improves performance. An interesting middle ground would be to extend the RPC layer with support for promises.

Given the existing problems and the paucity of time available to work on it, a better strategy is probably to use an existing solution for the communications layer. The two most promising candidates are probably Twisted's perspective broker and omniORB for Python.

Client cache

The client cache is critical for good ZEO performance. The cache keeps copies of recently used objects on disk, so that subsequent accesses don't require a trip to the server.

Guido developed a trace-driven simulator for the ZEO cache that lead to a number of improvements. The current cache keeps two data files. Every time an object is loaded, it is save in the current data file. When the current data file fills up, the cache deletes the other data file and starts with a new, empty current file. It's a very simple scheme (perhaps too simple), but it works well enough in practice if the cache is very large. Guido's simulator also showed that a buddy cache would work well.

Prefetching

Prefetching could improve the performance of the client cache. It's also possible to use a page-based cache system to achieve some benefits over the object-based cache we've got now.

If we can anticipate client access patterns, we can fetch objects before they are needed. The prefetched objects could be delivered in the background while the application is running. Or the server could send a group of objects in response to a request for one.

One need the Chandler project identified for its ZODB integration was the ability to exploit their server's bulk fetch API; for example, prefetching all the objects in a query's result set.

Mark Day's thesis suggests that a small amount of structural caching could be a good thing, where structural caching means that structure of application objects is exploited to inform caching. We would need to revise ZEO to allow clients to receive objects asynchronously.

How Big Are Persistent Objects?

permanent link
Wednesday, April 23, 2003

Python objects use a lot of memory. I think people are often surprised at just how much memory they use. Persistent objects in Zope use even more memory than regular objects. One reason is that the objects store extra data in the C struct for the object. Another reason is that each database connection loads independent copies of all the objects.

It's important to minimize the amount of memory that persistent objects use, because we want to provide minimum extra overhead for using ZODB in Python programs.

It may be more important to minimize the memory for ghost objects (surrogates). A ghost object is loaded in memory because some other object has referred to it, but it is a ghost because it hasn't been access recently. The only reason a ghost is in memory is to preserve pointer equality, so it seems wasteful if a ghost uses a lot of memory.

The memory usage for a persistent object stacks up like this:

Bytes Usage
12 Python GC header
8 PyObject_HEAD
20 PyPersist_HEAD (data mgr, oid, serial, atime, state)
8 Pointer to instance dict
8 Pointer to weakref list
144 instance dict (holding up to 5 attrs)
28 PyStringObject for 8-byte oid
28 PyStringObject for 8-byte serial number
256 Total

The C code from persistence.h is:

#define PyPersist_HEAD 
    PyObject_HEAD 
    PyObject *po_dm; 
    PyObject *po_oid; 
    PyObject *po_serial; 
    int po_atime; 
    enum PyPersist_State po_state;

typedef struct { PyPersist_HEAD } PyPersistObject;

There is other overhead, too. obmalloc will round up to an 8-byte offset for each malloc'd chunk. So the two strings actually use 32 bytes. And the GC header gets aligned with a long double, so on some platforms it will actually be 16 bytes. In the worst case, that's 12 more bytes.

A ghost is smaller because the instance dict is freed, but it still takes 96 bytes.

There are a few things we can do to make objects take less space. We'll probably work on that soon. First, we can use Python longs for oids and serial numbers instead of strings. In general, a 64-bit long takes as much space as an 8-byte string. OIDs are usually densely allocated, so they tend to be much smaller than 64 bits. A 30-bit OID will fit in a long that takes only 16 bytes. Using longs instead of strings will save 24 bytes (32 including obmalloc overhead).

Another space savings is to take the serial number out of the C struct for the object. In general, it's good to keep data out of the C struct, because a ghost still uses whatever memory was initially allocated for it. A ghost does not have a serial number. (A serial number is associated with the revision of the object that was loaded from the database when the object was unghosted.) If the serial number is stored in the instance dict instead, we will save 4 bytes in the object header. It will make using slots more complicated, but I think that's okay.

A third savings is to eliminate or shrink the po_atime slot that is currently used to provide LRU cache eviction. The po_atime and po_state slots are both ints, so they consume a total of 8 bytes. We should be able to fit the state in a single byte, and perhaps eliminate po_atime. It's hard to know how to replace the po_atime. ZODB3 uses a doubly-linked list in the C struct, but that adds 8 bytes of overhead to all objects. One possibility is to use Corbato's CLOCK algorithm, which is what Thor's adapative cache uses. (The details are different in Thor, because it is page-based, while ZODB is object-based.)

It is possible to use slots to eliminate the need for the instance dict, as well as the instance dict pointer and the weaklist pointer. On the other hand, if you use slots, you need to explicitly include _p_serial in the list. Perhaps there should be a base class for persistent classes with slots that does this automatically.

April 2003
Sun Mon Tue Wed Thu Fri Sat
1 2 3 4 5
6 7 8 9 10 11 12
13 14 15 16 17 18 19
20 21 22 23 24 25 26
27 28 29 30
Mar  May

xml


Copyright © 2003 by Jeremy Hylton <jeremy@alum.mit.edu> source

A Descriptor Tutorial

permanent link
Friday, April 25, 2003

It is funny that Guido's new-style class tutorial is called descrinto.html does describe descriptors. It mentions them at the end, under the list of additional topics that should be discussed. Descriptors are discussed in detail in PEP 252, but it is hard to read.

A descriptor is an object in a class dictionary. It describes how to find an attribute on an instance. For example, a property descriptor refers to functions that should be called when code attempts to get or set the property. A descriptor allows arbitrary code to be executed when a specific attribute is accessed or set. You can think of it as a getattr or setattr hook for a specific attribute.

I have written a few descriptors. The most interesting hack is a descriptor used to support persistent code in ZODB4. It allows classes and instances to have separate namespaces for the same attribute. That is, a class and its instances can both have attributes named "foo" and have different values for them.

Samuele Pedroni wrote some impressively arcane rexec attacks that used descriptors to trick trusted code into executing code from the untrusted environment. Sometimes descriptors seem too powerful. They provide unlimited control over what happens when you lookup an attribute on an object.

What seems particularly surprising about descriptors is that many simple Python expressions can end up calling descriptors. They can be used for special methods like __call__ or __getitem__.

A descriptor describes how to get and set a specific attribute on an object. It's two interesting methods are __get__() and __set__(). It also has attributes __name__, __doc__, and __objclass__.

The __get__ attribute is a good place to start. It is a method called with one or two arguments to retrieve a value. It is called to access the attribute on the class and on the instance. Whatever __get__() returns is the value of accessing the attribute. The documentation from PEP 252 says this:

__get__(): a function callable with one or two arguments that retrieves the attribute value from an object. This is also referred to as a "binding" operation, because it may return a "bound method" object in the case of method descriptors. The first argument, X, is the object from which the attribute must be retrieved or to which it must be bound. When X is None, the optional second argument, T, should be meta-object and the binding operation may return an *unbound* method restricted to instances of T. When both X and T are specified, X should be an instance of T. Exactly what is returned by the binding operation depends on the semantics of the descriptor; for example, static methods and class methods (see below) ignore the instance and bind to the type instead.

There's probably a nice, short essay to be written on descriptors, but no time to write that now.

Finding the Zope3 HTTP Server

permanent link
Monday, April 28, 2003

My goal for today was to write a simple benchmark for the HTTP server that comes with Zope3. I wanted to test the performance implications of socket changes in Python 2.3. I'd also been meaning to see how Zope3 does on Joe Armstrong's challenge.

The Python 2.3 issue is that a small change to the socket module to support timeouts seems to have a big performance implication for servers. Guido explained the problem on python-dev:

I'm guessing that the slowdown comes from the fact that calling a method like recv() on the wrapper object is now a Python method which calls the C method on the wrapped object.

Joe Amstrong's challenge had to do with Erlang's user-level threads. An Erlang server can handles thousands of threads at once, and, thus, supports a very efficient multi-threaded web server. At LL2, one of Joe's challenges was to sustain high throughput as number of concurrent connections increased.

Anway, the task at hand is to find the HTTP server code in Zope3. We really need an IDE for Zope3 that integrates Python, TAL, and ZCML. It would be fun to see if Eclipse would work.

I know there's a zope.publisher.http module, but I'm not quite sure what it does. (This ended up being a false start.)

zserver.zcml has a directive for the HTTP server.

     <startup:addServer type="HTTP" port="8080" verbose="true" />

I know a little bit out zcml directives and I recognize the name startup. That's zope.app.startup -- although I'm not sure how I would have found startup otherwise. It's got a meta.zcml that says

    <directive name="defineSite"
               attributes="name threads"
               handler="zope.app.startup.metaconfigure.defineSite">
...
      <subdirective name="addServer"
                    attributes="type port verbose logClass" />

</directive>

That's a great clue, because I now have some source code to look at. There's one slight annoyance: the defineSite() function is actually an alias for zope.app.startup.sitedefinition.SiteDefinition. Is this just in need of a little refactoring? Or is there a reason for the alias?

The addServer() method isn't very helpful, though. It just adds some configuration strings to a list of servers. The start() method gets a server object using getServerType() and calls its create() method, so that must be where the HTTP server is actually created. The getServerType() function is a little odd; it uses a dictionary for registration but adds two layers of classes, modules, and interfaces to it. In the end, I see that a server must call registerServerType() to add make the server configuration work, but I can't find any code that calls registerServerType().

I see that registerServerType is called from ZCML code. (Earlier today, Guido reminded me that with Zope3 code you need to grep Python + ZCML because the only call to a function often occurs in some ZCML.) A grep revealed that the registration was still here in zope.app.startup. The directive of interest points me at zope.server instead of zope.publisher. That makes sense! (The little bit of initial knowledge I had actually hurt, because HTTP shows up as both a server and a publisher.)

  <startup:registerServerType 
    name = "HTTP"
    factory = "zope.server.http.publisherhttpserver.PublisherHTTPServer"
    requestFactory="HTTPRequestFactory"
    logFactory = "zope.server.http.commonhitlogger.CommonHitLogger"
    defaultPort="8080"
    defaultVerbose="true" />

There is some code in zope.server.http.test.test_httpserver has the example code I was looking for. I'll need to study this code a little, because it appears to demonstrate the Zope3 approach for combining threads and the asyncore mainloop. There is also a simple example at the bottom of zope.server.http.httpserver.