Jeremy Hylton : weblog : December 2003 last modified Thu Mar 17 01:11:16 2005

Jeremy Hylton's Web Log, December 2003

"Groovy" Language

permanent link
Monday, December 01, 2003

Seen on the ll1 list: Groovy is consise & powerful dynamically typed OO language for the JVM. . Python benefits a lot from having good builtin datatypes. If Java just added those, how much better would it be?

We want a great high level scripting language thats designed specifically for the JVM. Ruby, Python & Dylan are cool & are their own platform with their own APIs and conventions. We want a scripting language that builds on the JDK and all the Java conventions & tools, but just adds the nice high level language features of static and/or dynamic typing, closures, native syntax for maps, lists, beans, events, markup, XPath etc.

One of the examples from the wiki looks like the kind of Java code I would probably write if I tried to program in Java (except that I wouldn't use a list to store a heterogeneous collection).

class MyTest extends GroovyTestCase {
    private foo = new Foo()

    void testFoo() throws Exception {
        map = ['foo':'abc', 'bar':123, 'xyz':'edam']
        list = [1.23, 1234, 'hello']
        foo.doSomething(map, list);    
    }
}

I wonder what the future of scripting languages for Java is. Jython is a good choice, but it isn't a trivial migration from Java to Python, and the developer community is too small. A Groovy-like language has a smaller migraton path, but it's another new language. How good does a new language need to be to gain traction?

An Interpreter Prompt on the AST Branch

permanent link
Tuesday, December 02, 2003

I made a little progress on the AST branch today: By fixing a few bugs and disabling use of sre on startup, I was able to bring up a normal interactive interpreter session. I don't have much time to work on Python, but it would be really great if the AST branch could be finished for Python 2.4.

I had been meaning to finish support for *varargs and **kwargs for months, but never found the time. It was all pretty simple, once I got back into the code. The only hitch was to add flags to the symbol table, which the assembler reads to set the code object flags for CO_VARARGS and CO_VARKEYWORDS.

Getting sre to compile correctly will be a good stress test now that there's minimal interactive support. It has several very large functions. At the moment, _parse() is failing at runtime.

Another good test would be to get setup.py to execute correctly, so that a make will finish without error.

One random idea for redesigning the AST has occurred to me a few times: Put the defaults argument values in a separate AST node that contains the FunctionDef node. Right now it's a real hassle to evaluate the defaults before pushing a new code block. If the defaults were the parent node, then the FunctionDef's children would all be executed in the function's environment.

Python Bounties

permanent link
Wednesday, December 03, 2003

Mark Shuttleworth is offering to fund Python-related projects. He has offered a bounty for Python Scripting Everywhere: scripting interfaces for applications like OpenOffice, Blender, AbiWord, Gnumeric, and the GIMP.

Python is a great glue language. Many of my favourite tools are already scriptable in Python. I'm open to requests for funding work that needs to be done to make Python the most widespread common scripting language on the net.

The 2004 budget for bounties is $100,000. He is also soliciting proposals for school administration software in Python and work on Mozilla.

Mark founded Thawte and sold it to Verisign. He was also the first African in Space. (Probably the first Python programmer, too.)

Working Out Kinks in Cache Simulation

permanent link
Thursday, December 04, 2003

I've spent a lot of time this week working out bugs in the cache simulations. Many of the results reported in earlier blog entries -- both on cache access performance and on 2Q simulation results were wrong. That seems to be par for the course: The cache simulations are sometimes complicated. Understanding simulation results means checking the results for consistency ten different ways. Eventually you understand what the numbers are telling you and how they relate to each other.

The new 2Q implementation is probably correct, but doesn't produce the dramatically good results reported earlier. It's still better than LRU by a non-trivial margin (5% difference in hit rate) for large caches.

The ARC algorithm has been very difficult to implementation for a cache with variable-sized objects. Much of the specific logic from their pseudocode is different when you are calculating with object sizes instead of pages. For example, if you add a new object, the object you choose to evict may be smaller, so you have to evict several objects.

The other interesting problem with the ARC cache is that it seems to adjust its p factor too slowly. In the paper, they say that 1 results in changes that are fast enough, but they only have tiny caches in the simulation. For a very large cache (like a 500MB ZEO cache), it seems to adapt far too slowly. The paper boasts that there are no tuning parameters in ARC, but I needed to tune the adjustment rate to get decent results.

We're still stuck trying to find a decent implementation of an LRU list when the objects in the list are stored on disk and not in memory. The Iyengar paper doesn't cover replacements at all, even though it seems to say that a cache is typically in an equilibrium where it must always do replacement.

For the record, here is apparently correct simulation results for LRU and 2Q on the zope.org cache trace.

LRUCacheSimulation, cache size 50,000,000 bytes
  START TIME  DURATION    LOADS     HITS INVALS WRITES HITRATE EVICTS
Nov 25 00:36   2:19:55    51340    15399     47    116   30.0%  28391
Nov 25 02:56   1:55:01    54952    14318     26    442   26.1%  31652
Nov 25 04:51   1:30:01    72530    23387     45     97   32.2%  41593
Nov 25 06:21   1:00:13    46689    15161     16    111   32.5%  25083
Nov 25 07:21   1:45:09    53628    17812     46    143   33.2%  28617
Nov 25 09:06   1:15:02    58238    19730     23    163   33.9%  30879
Nov 25 10:21   2:05:01   126466    36876    527   4751   29.2%  84047
Nov 25 12:26   1:15:03    51919    15501     22     68   29.9%  25989
Nov 25 13:41   2:45:03    69965    17362    107    296   24.8%  45034
Nov 25 16:26   4:19:34    37131    11960      3     38   32.2%  21452
Nov 25 20:46   6:05:06   110716    26365    140   2605   23.8%  78027
Nov 26 02:51   2:04:57    79765    24191    438   1157   30.3%  48735
Nov 26 04:56     40:01    53910    16997     63    187   31.5%  28491
Nov 26 05:36   1:15:11    51677    14870      8    456   28.8%  30886
Nov 26 06:51   1:00:08    51355    13124     16     92   25.6%  30407
Nov 26 07:51   1:04:56    51923    16151      5     68   31.1%  28086
Nov 26 08:56   1:05:13    66551    21848     12     90   32.8%  37771
Nov 26 10:01     50:04    48137    17088      6     80   35.5%  22723
Nov 26 10:51     28:00    40506    15784      3     70   39.0%  16731
Nov 25 00:36  34:43:38  1177398   353924   1553  11030   30.1% 684594 OVERALL

TwoQSimluation, cache size 50,000,000 bytes
  START TIME  DURATION    LOADS     HITS INVALS WRITES HITRATE EVICTS HOTHIT
Nov 25 00:36   2:19:55    51340    14223     71    116   27.7%  29929     90
Nov 25 02:56   1:55:01    54952    17129     28    442   31.2%  36735   5932
Nov 25 04:51   1:30:01    72530    29424     52     97   40.6%  43540  13274
Nov 25 06:21   1:00:13    46689    19995     17    111   42.8%  27212   7724
Nov 25 07:21   1:45:09    53628    19438     46    143   36.2%  34643   7832
Nov 25 09:06   1:15:02    58238    22877     15    163   39.3%  34734   9308
Nov 25 10:21   2:05:01   126466    45287    432   4751   35.8%  81699  17469
Nov 25 12:26   1:15:03    51919    18809     19     68   36.2%  28539   7551
Nov 25 13:41   2:45:03    69965    14727    107    296   21.0%  59106  11076
Nov 25 16:26   4:19:34    37131    15132      2     38   40.8%  23250   4213
Nov 25 20:46   6:05:06   110716    27964    121   2605   25.3%  82201  16686
Nov 26 02:51   2:04:57    79765    28798    292   1157   36.1%  50332  12103
Nov 26 04:56     40:01    53910    20701     60    187   38.4%  32999   7766
Nov 26 05:36   1:15:11    51677    19818      9    456   38.3%  31505   6998
Nov 26 06:51   1:00:08    51355    18963     11     92   36.9%  30628   8204
Nov 26 07:51   1:04:56    51923    21585      6     68   41.6%  32241   7465
Nov 26 08:56   1:05:13    66551    27667     13     90   41.6%  38881   9298
Nov 26 10:01     50:04    48137    20918      7     80   43.5%  26887   6758
Nov 26 10:51     28:00    40506    19589      4     70   48.4%  20435   5447
Nov 25 00:36  34:43:38  1177398   423044   1312  11030   35.9% 745496 165194 OVERALL

The ARC simulator is barely functional. There are a few corner cases where I don't understand what it's doing, so I doubt it's right. I'm also ignoring invalidations and writes, but it's producing some basic results that make some sense. It looks like it's favoring the LRU (L2) component over the FIFO component (L1) at the expense of performance. When I compare to 2Q, which is using more space for the FIFO component, I see that ARC has more LRU hits but many fewer FIFO hits.

It's difficult to say if these results are representative of ARC or are caused by bugs in my implementation. (Likely the latter given my recent track record and the hour.)

The first chart is with a "walk factor" of 500. If it needs to adjust p, it does so using the basic algorithm multiplied by the walk factor.

ARCCacheSimulation, cache size 50,000,000 bytes
  START TIME  DURATION    LOADS     HITS INVALS WRITES HITRATE LRUTHITS  EVICTS       P
Nov 25 00:36   2:19:55    51340    13001      0    116   25.3%     170   32170 2332000
Nov 25 02:56   1:55:01    54952    17790      0    442   32.4%    8280   33913 1145500
Nov 25 04:51   1:30:01    72530    18813      0     97   25.9%    8983   55903 5223500
Nov 25 06:21   1:00:13    46689    15301      0    111   32.8%    3566   31520 5353500
Nov 25 07:21   1:45:09    53628    15866      0    143   29.6%    4885   35772 6715500
Nov 25 09:06   1:15:02    58238    16603      0    163   28.5%    4693   42441 6187500
Nov 25 10:21   2:05:01   126466    38843      0   4751   30.7%   18441   90298  508500
Nov 25 12:26   1:15:03    51919    11498      0     68   22.1%    2424   38796 1910000
Nov 25 13:41   2:45:03    69965    10883      0    296   15.6%    8175   57130 7010000
Nov 25 16:26   4:19:34    37131     9392      0     38   25.3%    2146   23714 8427500
Nov 25 20:46   6:05:06   110716    32635      0   2605   29.5%   31363   83417 4433000
Nov 26 02:51   2:04:57    79765    22837      0   1157   28.6%    8656   58531 5609000
Nov 26 04:56     40:01    53910    14455      0    187   26.8%    3025   38794 6279000
Nov 26 05:36   1:15:11    51677    14862      0    456   28.8%    4006   36383 7866000
Nov 26 06:51   1:00:08    51355    13421      0     92   26.1%    2902   37051 10099500
Nov 26 07:51   1:04:56    51923    15462      0     68   29.8%    2922   38083 9251000
Nov 26 08:56   1:05:13    66551    21951      0     90   33.0%    4321   44048 9926000
Nov 26 10:01     50:04    48137    16766      0     80   34.8%    3382   31469 8320000
Nov 26 10:51     28:00    40506    15277      0     70   37.7%    3165   23202 7099500
Nov 25 00:36  34:43:38  1177398   335656      0  11030   28.5%  125505  832635 7099500 OVERALL

The couple of places where ARC was competitive with 2Q, it was were the LRU hit rate was very high. Everywhere else 2Q appeared to win by devoting more space to the LRU component. (This over simplifies, because 2Q is also more hesitant to promote to the LRU component.) So I tried changing the walk factor to 2000, since ARC is never using much of the cache for FIFO. It showed a small improvement, but not much.

ARCCacheSimulation, cache size 50,000,000 bytes
  START TIME  DURATION    LOADS     HITS INVALS WRITES HITRATE LRUTHITS  EVICTS       P
Nov 25 00:36   2:19:55    51340    13160      0    116   25.6%     163   32042 9064000
Nov 25 02:56   1:55:01    54952    16400      0    442   29.8%    5725   35860 10772000
Nov 25 04:51   1:30:01    72530    23346      0     97   32.2%    5752   50476 12750000
Nov 25 06:21   1:00:13    46689    15478      0    111   33.2%    2868   31651 10224000
Nov 25 07:21   1:45:09    53628    16411      0    143   30.6%    4215   35884 16850000
Nov 25 09:06   1:15:02    58238    19977      0    163   34.3%    4066   38113 16094000
Nov 25 10:21   2:05:01   126466    38994      0   4751   30.8%   16923   92417   14000
Nov 25 12:26   1:15:03    51919    12217      0     68   23.5%    3050   37720 5578000
Nov 25 13:41   2:45:03    69965    11217      0    296   16.0%    5612   58317 20374000
Nov 25 16:26   4:19:34    37131    13122      0     38   35.3%    1238   26363 20258000
Nov 25 20:46   6:05:06   110716    24968      0   2605   22.6%   22161   84526 11990000
Nov 26 02:51   2:04:57    79765    22640      0   1157   28.4%    7114   57781 9536000
Nov 26 04:56     40:01    53910    14579      0    187   27.0%    2378   38769 13806000
Nov 26 05:36   1:15:11    51677    16014      0    456   31.0%    3039   35352 18124000
Nov 26 06:51   1:00:08    51355    14413      0     92   28.1%    1852   35752 28536000
Nov 26 07:51   1:04:56    51923    16757      0     68   32.3%    1785   35976 26386000
Nov 26 08:56   1:05:13    66551    23303      0     90   35.0%    2963   40329 25508000
Nov 26 10:01     50:04    48137    17391      0     80   36.1%    1817   32920 20414000
Nov 26 10:51     28:00    40506    16211      0     70   40.0%    2987   24224 17756000
Nov 25 00:36  34:43:38  1177398   346598      0  11030   29.4%   95708  824472 17756000 OVERALL

Boosting the walk factor to 10,000 gets the overall hit rate up to 31.4%, a little better than LRU and still worse than 2Q.

Generating Custom Bytecode for Restricted Python

permanent link
Monday, December 08, 2003

One flaw in RestrictedPython was unpacking of tuples, lists, and other iterables in assignment statements, e.g. x, y = L. I finished the first cut at a solution today that uses the Python compiler package to generate custom bytecode that checks unpack operations.

This check is different than the others in RestrictedPython, because it generates custom bytecode instead of re-writing the source tree. Most of the other changes replace operations that would use a bytecode operation like LOAD_ATTR with a call to a Python function that has the same behavior, but also checks security. It's impractical to implement sequence unpacking checkins that way, because it would be difficult to generate new source code that had the same effect. The unpacking can occur in the target of a for list or list comprehension or in the argument list of a function; they would all require very substantial source-to-source transformations.

The bytecode generation is mostly straightforward. One problem is that the visitor method that generates unpack sequence is called after the value being unpacked is already on the stack. The visitor method needs to do a ROT_TWO to get the function and its argument on the stack in the right order.

I need to re-organize the code generator classes in the compiler package. There is a lot of complexity that exists primarily to allow the compiler to generate code with or without nested scopes. In Python 2.3, there's no need to support both variants anymore. It's also complex because it avoids circular dependencies at the class level be initializing references in an initClass() method.

The particular problem for this project was that RCompile extends all the top-level compile mode classes, i.e. "single", "eval", and "exec," and the function code generator. The compile mode classes are all related by inheritance with an abstract base class that defines stub methods that subclasses must override. I want to provide an implementation of one of those stub methods that all the subclasses share, which proved difficult. It was also hard to connect the various code generators, because of the initClass() magic.

IronPython performance looks good

permanent link
Tuesday, December 09, 2003, 10:10 a.m.

Jim Hugunin is at it again: The early results with IronPython show that Python compiled to IL can run fast. On the standard pystone benchmark, IronPython-0.1 is 70% faster than CPython-2.3.

An earlier Python for .NET project at ActiveState ran slowly, but the project report didn't offer a convincing explanation for the slowness: The Microsoft support staff used their internal tools to determine that the IPyType interface lookup for a given .NET object is the biggest hot spot in the runtime. That is, the analysis was disconnected from the development.

Jim's message to the dotnet-language-dev group includes the following microbenchmark results:

                IronPython-0.1  Python-2.3      Python-2.1      Jython-2.1
pystone         0.58            1.00            1.29            1.61

function call   0.19            1.00            1.12            1.33
integer add     0.59            1.00            1.18            1.08
string.replace  0.92            1.00            1.00            1.40
range(bigint)   5.57            1.00            1.09            16.02
eval("2+2")     66.97           1.00            1.58            91.33

It's nice to see that function calls are so much faster. There's a lot of overhead in CPython for function calls -- pushing arguments on the stack, copying them into the callees frame, finding the right builtins for the callee frame, initializing the rest of the frame. It seems like their ought to be a simpler way, at least in the common case.

More Questions and Answers for the ZEO Cache

permanent link
Tuesday, December 09, 2003, 10:47 a.m.

The work on the ZEO cache has been delayed by some other customer work at Zope Corp. I wanted to capture the current state of our efforts for when we get back to it.

PyCon DC 2004

permanent link
Tuesday, December 09, 2003, 1:41 p.m.

The Python conference will be held March 20-26, 2004 at George Washington University in Washington, D.C. PyCon registration is open, and the final call for papers will be posted next week. Mitch Kapor is the keynote speaker.

The last PyCon was a big success in my book. See Mike Orr's conference report in Linux Journal for a nice recap. There was a nice diversity of talks, but I liked the pre-conference sprint and the open space even better. It was a nice chace to hack with Neal Norwitz and g Neil Schemenauer. It looks like there will be a good set of sprints for 2004, too, including Twisted, Zope2 and Zope3, and PyPy. More on the sprints soon.

The other great thing about PyCon is the cost. It's $175 for the three conference days if you register early and the pre-conference sprints are free. The only real costs are travel and lodging in DC.

A New Gadfly

permanent link
Tuesday, December 09, 2003, 4:24 p.m.

And Aaron Watters is at it again, too. The xsdb package provides fundamental concurrent database functionality with concurrency control and recovery.

There was a stealth reference to it on comp.lang.python last week, which I missed. The real announcement reached the Stackless list this afternoon.

A Toy Bytecode Verifier

permanent link
Wednesday, December 10, 2003, 11:52 p.m.

I tried to write a little bytecode verifier for RestrictedPython that guaranteed that dangerous opcodes -- like STORE_SUBSCR -- were guarded properly as the result of the compiler transformations and custom code generator in RestrictedPython. The simple opcodes like LOAD_ATTR and UNPACK_SEQUENCE are easy to handle; the rest was too big an effort for this evening.

To verify that an opcod like STORE_SUBSCR is called in a safe way, you have to verify that it's container argument is either a checked object or a dict literal created with BUILD_MAP. Since the dict key can be an arbitrary expression, you need to do abstract interpretation with a simple type system to guarantee that the operation is safe. I was trying to do this in about 30 minutes, and didn't have time to simulate the type and stack effects on all the Python opcodes (ha!).

The goal of this project is to provide a complete check of the correctness of the transformation system. We can specify the basic security policy with respect to the bytecode, and verify that the generated bytecode honors this policy. That's a nice check that everything is working, but probably too ambitious for the current task.

Recognizing a Descriptor

permanent link
Thursday, December 11, 2003

If you ask a subclass of Persistent for its _p_oid attribute, you will get a descriptor back; the descriptor's __get__ method returns self when called for the class. I had to fix ZODB's code for recognizing persistent objects, because it wasn't prepare to get a descriptor when it asked for an oid. This caused the error installing Formulator that Sidnei reported yesterday.

The fix is to specifically check for descriptors and ignore them when looking for an oid. I had fixed the problem once before in ZODB 4 (in Python code), but forgot about it when we did the same work for ZODB 3.3 (in C code).

The problem is how do you recognize a descriptor? There is no API for this, in part because any object can be a descriptor. The best you can do is ask for an __get__ attribute. If it has one, it must be a descriptor. That ends up being a lot of work:

	    static PyObject *__get__;
	    PyObject *descr;
	    if (!__get__) {
		__get__ = PyString_InternFromString("__get__");
		if (!__get__)
		    goto err;
	    }
	    descr = PyObject_GetAttr(oid, __get__);
	    if (descr) {
                Py_DECREF(descr);
		goto return_none;
            }

	    /* XXX should check that this was an AttributeError */
	    PyErr_Clear();
You have to check whether the __get__ exists, then clear an exception that occurs if it doesn't exist. In this particular case, I know the C type of the object (PyGetSetDescr_Type), but that name isn't a part of the public API. In general, though, ZODB will support any object with an _p_oid, regardless of how it's implemented. ZODB could support persistent objects written in pure Python, where the type would be different.

Stackless Zope

permanent link
Thursday, December 11, 2003, 12:12 a.m.

Christian Tismer has a demo of Stackless Zope. It's a simple demo where the user code contains a for loop with a raw_input(). Each of those calls generates a new web form -- all inside the body of the for loop. The programmer's mental model and the code actually match up!

If I follow this correctly, Christian has built a continuation-based web programming environment for Python. Shriram Krishnamurthi has been saying this is the essence of web programming. I've been meaning to read and think more about it, but haven't found the time. I think the paper to start with is:

Graunke, Krishnamurthi, Van der Hoeven and Felleisen. Programming the Web with High-Level Programming Languages. ESOP 2001.

Queinnec is working on this too: Continuations allow web applications to be written in direct style that is, as a single program that displays forms and reads form submission since continuations automatically capture everything (control point, lexical bindings, etc.) that is needed to resume the computation. Programming is therefore safer, easier and more re-usable.

Aaron Watter's new xsdb uses Stackless to for its server mode. Why? On the Stackless list, Aaron wrote: Unrolling the application at each point where you need to waitsimply turns the code inside out (just like unrollinga set of recursions). Threading is the onlyacceptable alternative.

Aaron also says: I really think stackless type functionality makes a tremendous amount of senseand I still wish it would get folded into standard Python. It's worth considering again. I don't know if anyone has the time and effort to make it happen.

Concurrent Updates in Versions Fixed

permanent link
Friday, December 19, 2003

I fixed the two remaining tests on the MVCC branch today, which means we are very close to merging it with the trunk. There were actually two bugs, one trivial, the other somewhat tricky. The bugs were all in handling multiple revisions of objects in versions.

The easy bug was the MVCC code should not run in a connection that is using a version. We restricted the new loadBefore() method to only return non-version data in order to keep things simple. The connection was still trying to use it, but getting bogus non-version data. Easy to fix.

The trickier bug was provoked by the following scenario involving commits by two concurrent connections. Each connection is using a different version. They both operate on a BTree object locked in the first version. The 2nd-version connection loads and modifies the object, reading the non-version data.

Normally the 2nd-version transaction would fail with a locked version error when it commits, but in the failure case the first connection committed the version between the start and commit of the other transaction. That ordering produces a resolvable conflict, except that conflict resolution was seeing the version data for the original transaction when it should have seen the non-version data.

The conflict is resolvable because the commit version added a key and the current transaction is adding a different key. As a result, the merged tree should have both keys. If, on the other hand, the version data is seen for the original transaction, then it will appear like the committed transaction had no effect: it added a key that is already present. Conflict resolution will think that the conflicting transaction is okay and commit its state, erasing the key added by commit version.

It was a bug in loadSerial(). It returned version data when the serial number matched a versioned transaction. It should always return non-version data. One perplexing loose end is that it looks like the Berkeley storage has always had this bug. Perhaps Berkeley is so slow it never really provoked conflicts?

More hours burned supporting version trickery. Yesterday, Jim mentioned his latest idea for implementing versions without direct storage support. His idea is to use a wrapper or adapter that manages its own persistent storage to supplement the base storage. The wrapper implements the locking features necessary for versions -- or for an improved version-like feature. The benefit is that it only gets implemented once.

I think version locks need to be implemented at the storage level in order to get them right. I'm not sure if you can get the right effect at the application-level or not. (I think there was a thread about this on zodb-dev a year or so ago.)

The other big problem with tossing versions in the bin is that it would be a lot of work to remove them from the code. It's certainly not in scope for ZODB 3.3.