Jeremy Hylton : weblog : 2004-03-21

PyCon sprints, day 2

Sunday, March 21, 2004

Another good day of sprints. We fixed some hard bugs in the AST branch and had a planning session for Python 2.4.

There were more people around for day 2 of the sprints. Jim Fulton gave a day-long Zope 3 tutorial for about 10 people. (We hit a snag getting a projector for Jim, but Steve Holden and the Cafritz Center staff worked it pretty easily.) I'd guess there were about 50 people there by the afternoon.

We made better progress on closures bugs from the AST branch today. Yesterday we got stuck trying to figure out where the compiler was going wrong. With a fresh start today, it was pretty straightforward.

The AST branch has a new symbol table that has a simpler approach for deciding the scope of variables. It works in two completely separate passes over a module. (The old symbol table tried to work incrementally, revisiting child nodes as their parents were processed. Very complicated.) The first pass gathers evidence about each variable -- whether it's assigned to, passed as a parameter, bound by import, used by not defined, etc. The second pass works top-to-bottom to determine the scope -- local, global, free, or cell. The bindings visible in each function are passed in during this pass.

We found two bugs in the symbol table. The first bug was with cases like this:

def f():
    x = 1
    def g():
        def h():
            return x
        return h
    return g

The symbol table did not handle g() correctly. It wasn't generating any symbols for g(). It needed to mark x as free in g(), so that the code generator would build a closure to pass the binding of x through to h.

When we fixed that bug, we introduced another related bug. The symbol table was marking variables free instead of global. The second pass was including the bindings at module scope in the set of visible bindings passed to functions, but it should only have passed bindings from other function scopes. If the only binding for a variable is at module level, it's treated as global rather than free. (That's an implementation centric notion. They're all "free variables" in the academic sense, but Python has special rules for the top level.)

We fixed some other simple problems. Generators weren't getting the right flag set on the code object, so they weren't being called as generators. And we weren't passing through compiler flags set by future statements, which caused a few failures. We also discovered that we haven't finished code generation for extended slices.

There is still a lot of tedious bug fixing to do, but the branch is in much better shape. setup.py compiles and runs correctly now. You can actually run "make test" without crashing. Many tests fail, but the majority run successfully. It's much easier to track down bugs when the regression tests are available.

Guido was out sick today, but he asked us to have a Python 2.4 planning session anyway. Lots of the locals (me included, even though I'm not really local anymore) were only around for the weekend.

My chief goal is to finish the AST branch in April so that it can be included in Python 2.4. We agreed that it would be included if it was ready by early May. If not, we'll wait for a future release. If it does go in, we will probably need an extra alpha or beta release to make sure we flush out any bugs. Armin Rigo also pointed out that we'll need to coordinate work on the new compiler with work on new features like generator expressions that require compile changes.

Anthony Baxter is going to be the release manager again. No one else volunteered, hardly a surprise, but Anthony's been very capable.

There aren't a lot of new features going into the 2.4 release. It feels more like a release to stay on schedule than a release to get good new features in the hands of users. Generator expressions and function decorators are the top new features, but neither seems likely to cause lots of people to upgrade. Perhaps Raymond Hettinger's micro-optimizations will be the big news, but it's hard to judge what effect they have on real program performance.

We definitely need to work on the PEP for generator expressions. Guido jumped the gun by approving the PEP, because we didn't follow the regular PEP process. There's no specification or rationale, just a rough description and a few examples. I'm glad Guido approved the feature, but we need to go back and write the specification now. (I noticed today, Tuesday, that Guido is having second thoughts about the funny namespace rules that are being proposed.)