Jeremy Hylton : weblog : March 2004 last modified Thu Mar 17 01:11:16 2005

Jeremy Hylton's Web Log, March 2004

PyCon sprints, day 1

permanent link
Saturday, March 20, 2004

PyCon sprints started today. There were several people working in each room by the time I arrived at GWU on Saturday morning. I did a quick headcount after lunch; it looked like we had about 35 people working then.

The core sprint was well attended. There was a big contingent of locals, including Andrew, Barry, Fred, Neal, Neil, and Tim. Nick Bastin and some other folks from OPNET helped out.

My project for the weekend was the AST branch. Tim Peters observed that we will have to finish the AST branch this year or make it the annual sprint topic. Neal Norwitz, Neil Schemenauer, and I got to work on the list of open bugs. Fixed: doc strings, name mangling (__variables), partial fix for encoding declarations.

It was a great help to have some many core developers around. Martin von L÷wis helped Neil with Unicode issues, and Neil and Armin Rigo looked at stack depth calculation. They concluded that we would be better off calculating stack depth incrementally, just like the old compiler. Opcodes like MAKE_CLOSURE have stack effects that depend on the previous value on the stack, making a separate calculation difficult. (The compiler package does a conservative approximation of the correct depth.)

The major effort for the AST branch is done. Code generation works for nearly every source construct. The remaining work is slow going, though. The bug tend to be subtle and take a long time to fix. We started on fixing some problems with closures, but didn't get very far by the end of the day.

  • The representation for function doc strings seems to magical: It is the first element of co_consts, unless that element is None. Why not have an explicit co_docstring slot?
  • The line number table is so delicate and restricted. Armin Rigo suggested a scheme for mapping ranges of bytecodes to ranges of lines.

    It was easy to get distracted with so many core sprints in one place. Nick Bastin was working on a change to the profiler so that it tracks calls to builtin and C extension functions. Armin and Martin looked at a variety of ways to speedup Python functions. They wanted to streamline frame allocation. One possibility discussed on python-dev was to cache unused frames on the code object so that there was less initialization to do; another was to make frames smaller. I think they concluded that most of the benefit was achieved by replacing memset() with loops. Ha!

    We discussed a lot of ideas for making frames smaller, which I still think is a good idea. A smaller frame uses less memory. If it is small enough to use pymalloc, we might not need a custom freelist for frames. Some options for shrinking frames: Eliminate the block stack or allocate only the space that is needed for it. Remove fields that can be calculated from the code object. Get rid of the thread state.

    I think we concluded that it might be possible to remove the block stack completely. The block stack is used for try/except, try/finally, and for breaking out of loops. We can save space by eliminating the block stack. We could also eliminate the opcodes to setup loops and exceptions if we could determine all the details at compile time, e.g. an exception handling table.

    PyCon sprints, day 2

    permanent link
    Sunday, March 21, 2004

    Another good day of sprints. We fixed some hard bugs in the AST branch and had a planning session for Python 2.4.

    There were more people around for day 2 of the sprints. Jim Fulton gave a day-long Zope 3 tutorial for about 10 people. (We hit a snag getting a projector for Jim, but Steve Holden and the Cafritz Center staff worked it pretty easily.) I'd guess there were about 50 people there by the afternoon.

    We made better progress on closures bugs from the AST branch today. Yesterday we got stuck trying to figure out where the compiler was going wrong. With a fresh start today, it was pretty straightforward.

    The AST branch has a new symbol table that has a simpler approach for deciding the scope of variables. It works in two completely separate passes over a module. (The old symbol table tried to work incrementally, revisiting child nodes as their parents were processed. Very complicated.) The first pass gathers evidence about each variable -- whether it's assigned to, passed as a parameter, bound by import, used by not defined, etc. The second pass works top-to-bottom to determine the scope -- local, global, free, or cell. The bindings visible in each function are passed in during this pass.

    We found two bugs in the symbol table. The first bug was with cases like this:

    def f():
        x = 1
        def g():
            def h():
                return x
            return h
        return g
    

    The symbol table did not handle g() correctly. It wasn't generating any symbols for g(). It needed to mark x as free in g(), so that the code generator would build a closure to pass the binding of x through to h.

    When we fixed that bug, we introduced another related bug. The symbol table was marking variables free instead of global. The second pass was including the bindings at module scope in the set of visible bindings passed to functions, but it should only have passed bindings from other function scopes. If the only binding for a variable is at module level, it's treated as global rather than free. (That's an implementation centric notion. They're all "free variables" in the academic sense, but Python has special rules for the top level.)

    We fixed some other simple problems. Generators weren't getting the right flag set on the code object, so they weren't being called as generators. And we weren't passing through compiler flags set by future statements, which caused a few failures. We also discovered that we haven't finished code generation for extended slices.

    There is still a lot of tedious bug fixing to do, but the branch is in much better shape. setup.py compiles and runs correctly now. You can actually run "make test" without crashing. Many tests fail, but the majority run successfully. It's much easier to track down bugs when the regression tests are available.

    Guido was out sick today, but he asked us to have a Python 2.4 planning session anyway. Lots of the locals (me included, even though I'm not really local anymore) were only around for the weekend.

    My chief goal is to finish the AST branch in April so that it can be included in Python 2.4. We agreed that it would be included if it was ready by early May. If not, we'll wait for a future release. If it does go in, we will probably need an extra alpha or beta release to make sure we flush out any bugs. Armin Rigo also pointed out that we'll need to coordinate work on the new compiler with work on new features like generator expressions that require compile changes.

    Anthony Baxter is going to be the release manager again. No one else volunteered, hardly a surprise, but Anthony's been very capable.

    There aren't a lot of new features going into the 2.4 release. It feels more like a release to stay on schedule than a release to get good new features in the hands of users. Generator expressions and function decorators are the top new features, but neither seems likely to cause lots of people to upgrade. Perhaps Raymond Hettinger's micro-optimizations will be the big news, but it's hard to judge what effect they have on real program performance.

    We definitely need to work on the PEP for generator expressions. Guido jumped the gun by approving the PEP, because we didn't follow the regular PEP process. There's no specification or rationale, just a rough description and a few examples. I'm glad Guido approved the feature, but we need to go back and write the specification now. (I noticed today, Tuesday, that Guido is having second thoughts about the funny namespace rules that are being proposed.)

    PyCon attendance

    permanent link
    Tuesday, March 23, 2004

    There's going to be a crowd at PyCon -- 320 confirmed registrations! And some of the PyCon papers are online.

    It's going to be a challenge to accommodate that many people. The rooms we are planning to use for talks only hold around 100 people. Apparently, it's not possible to use the main ballroom for regular talks, because it would interfere with catering plans. The ballroom will even be at capacity for the keynote talks each morning.

    It's wonderful that so many people will be coming. I think it will be a little larger than any of the earlier Python conferences -- certainly a lot bigger than PyCon 2003.