Jeremy Hylton : weblog : 2003-09-12

Scaling to 100,000 Threads

Friday, September 12, 2003

A paper at SOSP this year is the latest entry in the long-standing debate about threads versus events. I have always tended to favor the threaded view, primarily because the event-based approach obscures the static control flow of the program.

The paper, by von Behren et al., is about Capriccio, a scalable threads packages for C programming. The basic argument is that several of the problems associated with the threaded approach have a lot to do with implementation issues, e.g. the amount of memory used for each thread's stack.

The Capriccio package is a user-level thread package for Linux that scales to 100,000 threads. Many of the implementation challenges are the result of providing a thread package for C programmers. They perform elaborate compile-time analysis to determine how much stack space is needed and insert checks for dynamic resizing. It is also a challenge to avoid blocking IO operations, which would block the entire process.

An earlier HotOS paper about Capriccio engages more in the debate on threads vs events. It's a nice companion to the implementation paper. The best argument against events is the "stack-ripping" problem (a term coined by Adya et al.). At each point that a blocking call could occur, you need to rip apart the stack and create a closure invoked via callback. I dislike this because it obscures the intent of the program.

Many higher-level languages rely on user-level threads. Erlang achieves amazing scalability for its threads package. PLT Scheme also uses user-level threads, though I don't know anything about scalability.

I need to understand whether Stackless Python provides a similar model. I haven't followed the Stackless 3.0 work closely, so I don't know the state of the threads package. Christian Tismer's slides for EuroPython suggest that you can create many threads, but they aren't intended for blocking IO.

I think a Python threads package for scalable internet services is possible and would be a great project. In Python, the stack management isn't so much of an issue because all the Python frames are heap allocated. Stackless, of course, contends with the places where Python still uses the C stack. (I wonder if you could create a variant Python frame for C functions; the frame would contain a pointer to the C function but could be allocated on the heap.) You would need to provide IO operations, e.g. sockets and threads, that presented a traditional blocking interface implemented on top of nonblocking primitives like poll and AIO. Unfortunately, I'm not aware of any papers that get into the details of these issues.

Stackless Python is currently pursuing a different model for concurrency based on tasklets. I'd like to find out more about them. I suspect they're related to Erlang's four concurrency primitives.

One problem with the current Capriccio implementation is that it doesn't take advantage of multiple processes. It's far more difficult to map M user-level threads onto N kernel threads. (Apparently the events camp has worked on this problem, but I haven't read the recent Zeldovich paper yet.) In a Python implementation, the problem might be worse because of the global interpreter lock.

Papers mentioned

Atul Adya, Jon Howell, Marvin Theimer, Bill Bolosky, John Douceur. Cooperative Task Management Without Manual Stack Management, or Event-driven Programming is not the Opposite of Threaded Programming. In Proceedings of the USENIX Annual Technical Conference, June 2002.

Rob von Behren, Jeremy Condit and Eric Brewer. Why Events Are A Bad Idea (for high-concurrency servers). In Proceedings of the 9th Workshop on Hot Topics in Operating Systems, 2003.

Rob von Behren, Jeremy Condit, Feng Zhou, George C. Necula, and Eric Brewer. Capriccio: Scalable Threads for Internet Services. In Proceedings of the 19th ACM Symposium on Operating Systems Principles, 2003.

John K. Ousterhout. Why Threads Are A Bad Idea (for most purposes). Presentation given at the 1996 Usenix Annual Technical Conference, January 1996.

Nickolai Zeldovich, Alexander Yip, Frank Dabek, Robert T. Morris, David Mazieres, Frans Kaashoek, Multiprocessor Support for Event-Driven Programs, USENIX 2003 Annual Technical Conference, June 2003.