Jeremy Hylton :
weblog :
December 2003
| last modified Thu Mar 17 01:11:16 2005
| |
"Groovy" Language | permanent link |
Seen on the ll1 list: Groovy is consise & powerful dynamically typed OO language for the JVM. . Python benefits a lot from having good builtin datatypes. If Java just added those, how much better would it be?
One of the examples from the wiki looks like the kind of Java code I would probably write if I tried to program in Java (except that I wouldn't use a list to store a heterogeneous collection).
class MyTest extends GroovyTestCase {
private foo = new Foo()
void testFoo() throws Exception {
map = ['foo':'abc', 'bar':123, 'xyz':'edam']
list = [1.23, 1234, 'hello']
foo.doSomething(map, list);
}
}
I wonder what the future of scripting languages for Java is. Jython is a good choice, but it isn't a trivial migration from Java to Python, and the developer community is too small. A Groovy-like language has a smaller migraton path, but it's another new language. How good does a new language need to be to gain traction?
An Interpreter Prompt on the AST Branch | permanent link |
I made a little progress on the AST branch today: By fixing a few bugs and disabling use of sre on startup, I was able to bring up a normal interactive interpreter session. I don't have much time to work on Python, but it would be really great if the AST branch could be finished for Python 2.4.
I had been meaning to finish support for *varargs and **kwargs for months, but never found the time. It was all pretty simple, once I got back into the code. The only hitch was to add flags to the symbol table, which the assembler reads to set the code object flags for CO_VARARGS and CO_VARKEYWORDS.
Getting sre to compile correctly will be a good stress test now that there's minimal interactive support. It has several very large functions. At the moment, _parse() is failing at runtime.
Another good test would be to get setup.py to execute correctly, so that a make will finish without error.
One random idea for redesigning the AST has occurred to me a few times: Put the defaults argument values in a separate AST node that contains the FunctionDef node. Right now it's a real hassle to evaluate the defaults before pushing a new code block. If the defaults were the parent node, then the FunctionDef's children would all be executed in the function's environment.
Python Bounties | permanent link |
Mark Shuttleworth is offering to fund Python-related projects. He has offered a bounty for Python Scripting Everywhere: scripting interfaces for applications like OpenOffice, Blender, AbiWord, Gnumeric, and the GIMP.
Python is a great glue language. Many of my favourite tools are already scriptable in Python. I'm open to requests for funding work that needs to be done to make Python the most widespread common scripting language on the net.
The 2004 budget for bounties is $100,000. He is also soliciting proposals for school administration software in Python and work on Mozilla.
Mark founded Thawte and sold it to Verisign. He was also the first African in Space. (Probably the first Python programmer, too.)
Working Out Kinks in Cache Simulation | permanent link |
I've spent a lot of time this week working out bugs in the cache simulations. Many of the results reported in earlier blog entries -- both on cache access performance and on 2Q simulation results were wrong. That seems to be par for the course: The cache simulations are sometimes complicated. Understanding simulation results means checking the results for consistency ten different ways. Eventually you understand what the numbers are telling you and how they relate to each other.
The new 2Q implementation is probably correct, but doesn't produce the dramatically good results reported earlier. It's still better than LRU by a non-trivial margin (5% difference in hit rate) for large caches.
The ARC algorithm has been very difficult to implementation for a cache with variable-sized objects. Much of the specific logic from their pseudocode is different when you are calculating with object sizes instead of pages. For example, if you add a new object, the object you choose to evict may be smaller, so you have to evict several objects.
The other interesting problem with the ARC cache is that it seems to adjust its p factor too slowly. In the paper, they say that 1 results in changes that are fast enough, but they only have tiny caches in the simulation. For a very large cache (like a 500MB ZEO cache), it seems to adapt far too slowly. The paper boasts that there are no tuning parameters in ARC, but I needed to tune the adjustment rate to get decent results.
We're still stuck trying to find a decent implementation of an LRU list when the objects in the list are stored on disk and not in memory. The Iyengar paper doesn't cover replacements at all, even though it seems to say that a cache is typically in an equilibrium where it must always do replacement.
For the record, here is apparently correct simulation results for LRU and 2Q on the zope.org cache trace.
LRUCacheSimulation, cache size 50,000,000 bytes START TIME DURATION LOADS HITS INVALS WRITES HITRATE EVICTS Nov 25 00:36 2:19:55 51340 15399 47 116 30.0% 28391 Nov 25 02:56 1:55:01 54952 14318 26 442 26.1% 31652 Nov 25 04:51 1:30:01 72530 23387 45 97 32.2% 41593 Nov 25 06:21 1:00:13 46689 15161 16 111 32.5% 25083 Nov 25 07:21 1:45:09 53628 17812 46 143 33.2% 28617 Nov 25 09:06 1:15:02 58238 19730 23 163 33.9% 30879 Nov 25 10:21 2:05:01 126466 36876 527 4751 29.2% 84047 Nov 25 12:26 1:15:03 51919 15501 22 68 29.9% 25989 Nov 25 13:41 2:45:03 69965 17362 107 296 24.8% 45034 Nov 25 16:26 4:19:34 37131 11960 3 38 32.2% 21452 Nov 25 20:46 6:05:06 110716 26365 140 2605 23.8% 78027 Nov 26 02:51 2:04:57 79765 24191 438 1157 30.3% 48735 Nov 26 04:56 40:01 53910 16997 63 187 31.5% 28491 Nov 26 05:36 1:15:11 51677 14870 8 456 28.8% 30886 Nov 26 06:51 1:00:08 51355 13124 16 92 25.6% 30407 Nov 26 07:51 1:04:56 51923 16151 5 68 31.1% 28086 Nov 26 08:56 1:05:13 66551 21848 12 90 32.8% 37771 Nov 26 10:01 50:04 48137 17088 6 80 35.5% 22723 Nov 26 10:51 28:00 40506 15784 3 70 39.0% 16731 Nov 25 00:36 34:43:38 1177398 353924 1553 11030 30.1% 684594 OVERALL TwoQSimluation, cache size 50,000,000 bytes START TIME DURATION LOADS HITS INVALS WRITES HITRATE EVICTS HOTHIT Nov 25 00:36 2:19:55 51340 14223 71 116 27.7% 29929 90 Nov 25 02:56 1:55:01 54952 17129 28 442 31.2% 36735 5932 Nov 25 04:51 1:30:01 72530 29424 52 97 40.6% 43540 13274 Nov 25 06:21 1:00:13 46689 19995 17 111 42.8% 27212 7724 Nov 25 07:21 1:45:09 53628 19438 46 143 36.2% 34643 7832 Nov 25 09:06 1:15:02 58238 22877 15 163 39.3% 34734 9308 Nov 25 10:21 2:05:01 126466 45287 432 4751 35.8% 81699 17469 Nov 25 12:26 1:15:03 51919 18809 19 68 36.2% 28539 7551 Nov 25 13:41 2:45:03 69965 14727 107 296 21.0% 59106 11076 Nov 25 16:26 4:19:34 37131 15132 2 38 40.8% 23250 4213 Nov 25 20:46 6:05:06 110716 27964 121 2605 25.3% 82201 16686 Nov 26 02:51 2:04:57 79765 28798 292 1157 36.1% 50332 12103 Nov 26 04:56 40:01 53910 20701 60 187 38.4% 32999 7766 Nov 26 05:36 1:15:11 51677 19818 9 456 38.3% 31505 6998 Nov 26 06:51 1:00:08 51355 18963 11 92 36.9% 30628 8204 Nov 26 07:51 1:04:56 51923 21585 6 68 41.6% 32241 7465 Nov 26 08:56 1:05:13 66551 27667 13 90 41.6% 38881 9298 Nov 26 10:01 50:04 48137 20918 7 80 43.5% 26887 6758 Nov 26 10:51 28:00 40506 19589 4 70 48.4% 20435 5447 Nov 25 00:36 34:43:38 1177398 423044 1312 11030 35.9% 745496 165194 OVERALL
The ARC simulator is barely functional. There are a few corner cases where I don't understand what it's doing, so I doubt it's right. I'm also ignoring invalidations and writes, but it's producing some basic results that make some sense. It looks like it's favoring the LRU (L2) component over the FIFO component (L1) at the expense of performance. When I compare to 2Q, which is using more space for the FIFO component, I see that ARC has more LRU hits but many fewer FIFO hits.
It's difficult to say if these results are representative of ARC or are caused by bugs in my implementation. (Likely the latter given my recent track record and the hour.)
The first chart is with a "walk factor" of 500. If it needs to adjust p, it does so using the basic algorithm multiplied by the walk factor.
ARCCacheSimulation, cache size 50,000,000 bytes START TIME DURATION LOADS HITS INVALS WRITES HITRATE LRUTHITS EVICTS P Nov 25 00:36 2:19:55 51340 13001 0 116 25.3% 170 32170 2332000 Nov 25 02:56 1:55:01 54952 17790 0 442 32.4% 8280 33913 1145500 Nov 25 04:51 1:30:01 72530 18813 0 97 25.9% 8983 55903 5223500 Nov 25 06:21 1:00:13 46689 15301 0 111 32.8% 3566 31520 5353500 Nov 25 07:21 1:45:09 53628 15866 0 143 29.6% 4885 35772 6715500 Nov 25 09:06 1:15:02 58238 16603 0 163 28.5% 4693 42441 6187500 Nov 25 10:21 2:05:01 126466 38843 0 4751 30.7% 18441 90298 508500 Nov 25 12:26 1:15:03 51919 11498 0 68 22.1% 2424 38796 1910000 Nov 25 13:41 2:45:03 69965 10883 0 296 15.6% 8175 57130 7010000 Nov 25 16:26 4:19:34 37131 9392 0 38 25.3% 2146 23714 8427500 Nov 25 20:46 6:05:06 110716 32635 0 2605 29.5% 31363 83417 4433000 Nov 26 02:51 2:04:57 79765 22837 0 1157 28.6% 8656 58531 5609000 Nov 26 04:56 40:01 53910 14455 0 187 26.8% 3025 38794 6279000 Nov 26 05:36 1:15:11 51677 14862 0 456 28.8% 4006 36383 7866000 Nov 26 06:51 1:00:08 51355 13421 0 92 26.1% 2902 37051 10099500 Nov 26 07:51 1:04:56 51923 15462 0 68 29.8% 2922 38083 9251000 Nov 26 08:56 1:05:13 66551 21951 0 90 33.0% 4321 44048 9926000 Nov 26 10:01 50:04 48137 16766 0 80 34.8% 3382 31469 8320000 Nov 26 10:51 28:00 40506 15277 0 70 37.7% 3165 23202 7099500 Nov 25 00:36 34:43:38 1177398 335656 0 11030 28.5% 125505 832635 7099500 OVERALL
The couple of places where ARC was competitive with 2Q, it was were the LRU hit rate was very high. Everywhere else 2Q appeared to win by devoting more space to the LRU component. (This over simplifies, because 2Q is also more hesitant to promote to the LRU component.) So I tried changing the walk factor to 2000, since ARC is never using much of the cache for FIFO. It showed a small improvement, but not much.
ARCCacheSimulation, cache size 50,000,000 bytes START TIME DURATION LOADS HITS INVALS WRITES HITRATE LRUTHITS EVICTS P Nov 25 00:36 2:19:55 51340 13160 0 116 25.6% 163 32042 9064000 Nov 25 02:56 1:55:01 54952 16400 0 442 29.8% 5725 35860 10772000 Nov 25 04:51 1:30:01 72530 23346 0 97 32.2% 5752 50476 12750000 Nov 25 06:21 1:00:13 46689 15478 0 111 33.2% 2868 31651 10224000 Nov 25 07:21 1:45:09 53628 16411 0 143 30.6% 4215 35884 16850000 Nov 25 09:06 1:15:02 58238 19977 0 163 34.3% 4066 38113 16094000 Nov 25 10:21 2:05:01 126466 38994 0 4751 30.8% 16923 92417 14000 Nov 25 12:26 1:15:03 51919 12217 0 68 23.5% 3050 37720 5578000 Nov 25 13:41 2:45:03 69965 11217 0 296 16.0% 5612 58317 20374000 Nov 25 16:26 4:19:34 37131 13122 0 38 35.3% 1238 26363 20258000 Nov 25 20:46 6:05:06 110716 24968 0 2605 22.6% 22161 84526 11990000 Nov 26 02:51 2:04:57 79765 22640 0 1157 28.4% 7114 57781 9536000 Nov 26 04:56 40:01 53910 14579 0 187 27.0% 2378 38769 13806000 Nov 26 05:36 1:15:11 51677 16014 0 456 31.0% 3039 35352 18124000 Nov 26 06:51 1:00:08 51355 14413 0 92 28.1% 1852 35752 28536000 Nov 26 07:51 1:04:56 51923 16757 0 68 32.3% 1785 35976 26386000 Nov 26 08:56 1:05:13 66551 23303 0 90 35.0% 2963 40329 25508000 Nov 26 10:01 50:04 48137 17391 0 80 36.1% 1817 32920 20414000 Nov 26 10:51 28:00 40506 16211 0 70 40.0% 2987 24224 17756000 Nov 25 00:36 34:43:38 1177398 346598 0 11030 29.4% 95708 824472 17756000 OVERALL
Boosting the walk factor to 10,000 gets the overall hit rate up to 31.4%, a little better than LRU and still worse than 2Q.
Generating Custom Bytecode for Restricted Python | permanent link |
One flaw in RestrictedPython was unpacking of tuples, lists, and other
iterables in assignment statements, e.g. x, y = L. I
finished the first cut at a solution today that uses the Python
compiler package to generate custom bytecode that checks unpack
operations.
This check is different than the others in RestrictedPython, because it generates custom bytecode instead of re-writing the source tree. Most of the other changes replace operations that would use a bytecode operation like LOAD_ATTR with a call to a Python function that has the same behavior, but also checks security. It's impractical to implement sequence unpacking checkins that way, because it would be difficult to generate new source code that had the same effect. The unpacking can occur in the target of a for list or list comprehension or in the argument list of a function; they would all require very substantial source-to-source transformations.
The bytecode generation is mostly straightforward. One problem is that the visitor method that generates unpack sequence is called after the value being unpacked is already on the stack. The visitor method needs to do a ROT_TWO to get the function and its argument on the stack in the right order.
I need to re-organize the code generator classes in the compiler package. There is a lot of complexity that exists primarily to allow the compiler to generate code with or without nested scopes. In Python 2.3, there's no need to support both variants anymore. It's also complex because it avoids circular dependencies at the class level be initializing references in an initClass() method.
The particular problem for this project was that RCompile extends all the top-level compile mode classes, i.e. "single", "eval", and "exec," and the function code generator. The compile mode classes are all related by inheritance with an abstract base class that defines stub methods that subclasses must override. I want to provide an implementation of one of those stub methods that all the subclasses share, which proved difficult. It was also hard to connect the various code generators, because of the initClass() magic.
IronPython performance looks good | permanent link |
Jim Hugunin is at it again: The early results with IronPython show that Python compiled to IL can run fast. On the standard pystone benchmark, IronPython-0.1 is 70% faster than CPython-2.3.
An earlier Python for .NET project at ActiveState ran slowly, but the project report didn't offer a convincing explanation for the slowness: The Microsoft support staff used their internal tools to determine that the IPyType interface lookup for a given .NET object is the biggest hot spot in the runtime. That is, the analysis was disconnected from the development.
Jim's message to the dotnet-language-dev group includes the following microbenchmark results:
IronPython-0.1 Python-2.3 Python-2.1 Jython-2.1
pystone 0.58 1.00 1.29 1.61
function call 0.19 1.00 1.12 1.33
integer add 0.59 1.00 1.18 1.08
string.replace 0.92 1.00 1.00 1.40
range(bigint) 5.57 1.00 1.09 16.02
eval("2+2") 66.97 1.00 1.58 91.33
It's nice to see that function calls are so much faster. There's a lot of overhead in CPython for function calls -- pushing arguments on the stack, copying them into the callees frame, finding the right builtins for the callee frame, initializing the rest of the frame. It seems like their ought to be a simpler way, at least in the common case.
More Questions and Answers for the ZEO Cache | permanent link |
The work on the ZEO cache has been delayed by some other customer work at Zope Corp. I wanted to capture the current state of our efforts for when we get back to it.
PyCon DC 2004 | permanent link |
The Python conference will be held March 20-26, 2004 at George Washington University in Washington, D.C. PyCon registration is open, and the final call for papers will be posted next week. Mitch Kapor is the keynote speaker.
The last PyCon was a big success in my book. See Mike Orr's conference report in Linux Journal for a nice recap. There was a nice diversity of talks, but I liked the pre-conference sprint and the open space even better. It was a nice chace to hack with Neal Norwitz and g Neil Schemenauer. It looks like there will be a good set of sprints for 2004, too, including Twisted, Zope2 and Zope3, and PyPy. More on the sprints soon.
The other great thing about PyCon is the cost. It's $175 for the three conference days if you register early and the pre-conference sprints are free. The only real costs are travel and lodging in DC.
A New Gadfly | permanent link |
And Aaron Watters is at it again, too. The xsdb package provides fundamental concurrent database functionality with concurrency control and recovery.
There was a stealth reference to it on comp.lang.python last week, which I missed. The real announcement reached the Stackless list this afternoon.
A Toy Bytecode Verifier | permanent link |
I tried to write a little bytecode verifier for RestrictedPython that guaranteed that dangerous opcodes -- like STORE_SUBSCR -- were guarded properly as the result of the compiler transformations and custom code generator in RestrictedPython. The simple opcodes like LOAD_ATTR and UNPACK_SEQUENCE are easy to handle; the rest was too big an effort for this evening.
To verify that an opcod like STORE_SUBSCR is called in a safe way, you have to verify that it's container argument is either a checked object or a dict literal created with BUILD_MAP. Since the dict key can be an arbitrary expression, you need to do abstract interpretation with a simple type system to guarantee that the operation is safe. I was trying to do this in about 30 minutes, and didn't have time to simulate the type and stack effects on all the Python opcodes (ha!).
The goal of this project is to provide a complete check of the correctness of the transformation system. We can specify the basic security policy with respect to the bytecode, and verify that the generated bytecode honors this policy. That's a nice check that everything is working, but probably too ambitious for the current task.
Recognizing a Descriptor | permanent link |
If you ask a subclass of Persistent for its _p_oid attribute, you will get a descriptor back; the descriptor's __get__ method returns self when called for the class. I had to fix ZODB's code for recognizing persistent objects, because it wasn't prepare to get a descriptor when it asked for an oid. This caused the error installing Formulator that Sidnei reported yesterday.
The fix is to specifically check for descriptors and ignore them when looking for an oid. I had fixed the problem once before in ZODB 4 (in Python code), but forgot about it when we did the same work for ZODB 3.3 (in C code).
The problem is how do you recognize a descriptor? There is no API for this, in part because any object can be a descriptor. The best you can do is ask for an __get__ attribute. If it has one, it must be a descriptor. That ends up being a lot of work:
static PyObject *__get__;
PyObject *descr;
if (!__get__) {
__get__ = PyString_InternFromString("__get__");
if (!__get__)
goto err;
}
descr = PyObject_GetAttr(oid, __get__);
if (descr) {
Py_DECREF(descr);
goto return_none;
}
/* XXX should check that this was an AttributeError */
PyErr_Clear();
You have to check whether the __get__ exists, then clear an exception
that occurs if it doesn't exist. In this particular case, I know the
C type of the object (PyGetSetDescr_Type), but that name isn't a part
of the public API. In general, though, ZODB will support any object
with an _p_oid, regardless of how it's implemented. ZODB could
support persistent objects written in pure Python, where the type
would be different.
Stackless Zope | permanent link |
Christian Tismer has a demo of Stackless Zope. It's a simple demo where the user code contains a for loop with a raw_input(). Each of those calls generates a new web form -- all inside the body of the for loop. The programmer's mental model and the code actually match up!
If I follow this correctly, Christian has built a continuation-based web programming environment for Python. Shriram Krishnamurthi has been saying this is the essence of web programming. I've been meaning to read and think more about it, but haven't found the time. I think the paper to start with is:
Graunke, Krishnamurthi, Van der Hoeven and Felleisen. Programming the Web with High-Level Programming Languages. ESOP 2001.
Aaron Watter's new xsdb uses Stackless to for its server mode. Why? On the Stackless list, Aaron wrote: Unrolling the application at each point where you need to waitsimply turns the code inside out (just like unrollinga set of recursions). Threading is the onlyacceptable alternative.
Aaron also says: I really think stackless type functionality makes a tremendous amount of senseand I still wish it would get folded into standard Python. It's worth considering again. I don't know if anyone has the time and effort to make it happen.
Concurrent Updates in Versions Fixed | permanent link |
I fixed the two remaining tests on the MVCC branch today, which means we are very close to merging it with the trunk. There were actually two bugs, one trivial, the other somewhat tricky. The bugs were all in handling multiple revisions of objects in versions.
The easy bug was the MVCC code should not run in a connection that is using a version. We restricted the new loadBefore() method to only return non-version data in order to keep things simple. The connection was still trying to use it, but getting bogus non-version data. Easy to fix.
The trickier bug was provoked by the following scenario involving commits by two concurrent connections. Each connection is using a different version. They both operate on a BTree object locked in the first version. The 2nd-version connection loads and modifies the object, reading the non-version data.
Normally the 2nd-version transaction would fail with a locked version error when it commits, but in the failure case the first connection committed the version between the start and commit of the other transaction. That ordering produces a resolvable conflict, except that conflict resolution was seeing the version data for the original transaction when it should have seen the non-version data.
The conflict is resolvable because the commit version added a key and the current transaction is adding a different key. As a result, the merged tree should have both keys. If, on the other hand, the version data is seen for the original transaction, then it will appear like the committed transaction had no effect: it added a key that is already present. Conflict resolution will think that the conflicting transaction is okay and commit its state, erasing the key added by commit version.
It was a bug in loadSerial(). It returned version data when the serial number matched a versioned transaction. It should always return non-version data. One perplexing loose end is that it looks like the Berkeley storage has always had this bug. Perhaps Berkeley is so slow it never really provoked conflicts?
More hours burned supporting version trickery. Yesterday, Jim mentioned his latest idea for implementing versions without direct storage support. His idea is to use a wrapper or adapter that manages its own persistent storage to supplement the base storage. The wrapper implements the locking features necessary for versions -- or for an improved version-like feature. The benefit is that it only gets implemented once.
I think version locks need to be implemented at the storage level in order to get them right. I'm not sure if you can get the right effect at the application-level or not. (I think there was a thread about this on zodb-dev a year or so ago.)
The other big problem with tossing versions in the bin is that it would be a lot of work to remove them from the code. It's certainly not in scope for ZODB 3.3.