Jeremy Hylton : weblog : 2003-12-08

Generating Custom Bytecode for Restricted Python

Monday, December 08, 2003

One flaw in RestrictedPython was unpacking of tuples, lists, and other iterables in assignment statements, e.g. x, y = L. I finished the first cut at a solution today that uses the Python compiler package to generate custom bytecode that checks unpack operations.

This check is different than the others in RestrictedPython, because it generates custom bytecode instead of re-writing the source tree. Most of the other changes replace operations that would use a bytecode operation like LOAD_ATTR with a call to a Python function that has the same behavior, but also checks security. It's impractical to implement sequence unpacking checkins that way, because it would be difficult to generate new source code that had the same effect. The unpacking can occur in the target of a for list or list comprehension or in the argument list of a function; they would all require very substantial source-to-source transformations.

The bytecode generation is mostly straightforward. One problem is that the visitor method that generates unpack sequence is called after the value being unpacked is already on the stack. The visitor method needs to do a ROT_TWO to get the function and its argument on the stack in the right order.

I need to re-organize the code generator classes in the compiler package. There is a lot of complexity that exists primarily to allow the compiler to generate code with or without nested scopes. In Python 2.3, there's no need to support both variants anymore. It's also complex because it avoids circular dependencies at the class level be initializing references in an initClass() method.

The particular problem for this project was that RCompile extends all the top-level compile mode classes, i.e. "single", "eval", and "exec," and the function code generator. The compile mode classes are all related by inheritance with an abstract base class that defines stub methods that subclasses must override. I want to provide an implementation of one of those stub methods that all the subclasses share, which proved difficult. It was also hard to connect the various code generators, because of the initClass() magic.