Towards faster Python implementations - theory

Tue May 8 12:29:38 EDT 2007

On 8th May, 17:53, John Nagle <n... at animats.com> wrote:
>     Some faster Python implementations are under development.
> JPython

Ahem: Jython!

> has been around for a while,  and PyPy and ShedSkin
> continue to move forward.  It's worth thinking about what slows
> down Python implementations.

It's the dynamicity! ;-) But things like clever memory allocation can
pay dividends, too, and I wouldn't be surprised if this explained
Python's better-than-expected showing when compared to other languages
- such as that comparison you provoked with those claims of superior
JavaScript performance. ;-)

>     It isn't the dynamism, really.

In theory, no, but in practice CPython doesn't optimise away the
dynamism. Recent experiments with method caches seem to have shown
modest performance improvements, and I'd guess that such things are
fairly established in other dynamic language implementations.

>  As others have pointed out
> in the Python literature, most of the time, the more elaborate
> dynamic features aren't being used for any given variable or
> object.  If code has something like "x = 1.0", the odds are that
> "x" is going to stay a floating point number, and not suddenly turn
> into a list or an object reference.  The type inference of Shed Skin
> builds on that assumption, adding some restrictions about changing of
> variable types.

The problem here, amongst many others, is knowing for sure whether the
more elaborate features have been avoided. Approaches which attempt to
avoid knowing such things include just-in-time compilation (you avoid
knowing in advance) and type declarations (you give up thinking about
whether it's possible and have the programmer do all the work).

>     The Shed Skin effort indicates that explicit typing, via 'decorators'
> or otherwise, isn't really necessary.  What's necessary is the avoidance
> of "surprises" in the language.  In this context, a "surprise" is
> the use of a dynamic feature in a way that can't be seen at compile time.

I concur with your assessment of the supposed necessity of explicit
typing. However, you might be surprised as to what constitutes a
"surprise" in Python...

>     A typical "surprise" would be the use of "setattr" on an object from
> outside the compilation unit that defines the object.  Within a module,
> "setattr" on an object in that module is no problem; the compiler can see
> it and generate the extra machinery needed to make an object dynamically
> alterable at run time.  But if an object doesn't need that extra machinery
> and associated dictionary, it's a big win to discard the excess baggage
> and use a simpler fixed-size representation, comparable to a C struct,
> for the object.

You don't even need to bring out setattr to make the inference
activity a difficult one. Even straightforward attribute access needs
to be repeatedly checked to see if the target of a normal attribute
assignment or query has changed. Traditionally, people have shown some
trivial function in isolation...

  def f(x):
      return x.a

...and said, "We don't know anything! It's all impossible!" But
context is everything, as you know, and whole program analysis is the
only way to go with the aforementioned goals in mind. What if the
parameter to the above itself comes from attribute access?

  def g(y):
      return f(y.b)

You can descend into some fairly demanding situations with respect to
keeping track of all the possibilities.

>     On the typing front, the neatest way to express typing is via
> initialization.  With the Shed Skin restrictions, if all variables are
> initialized before use (preferably in __init__), there's no need to
> maintain an "undefined" flag for a variable.  And, of course, if
> the type of a varaible is simple and can't change, it doesn't have to
> be "boxed", (enclosed in an object) which is a big win.

I'm fairly sure, although my intuition unfortunately doesn't provide a
ready example right now, that typing via initialisation, whilst an
important tool, may either impose unacceptable restrictions if
enforced too rigidly or limit the precision that one might desire in
determining type information in the resulting system. But it is a
worthwhile objective to identify fixed-size structures and unboxed
values, in my opinion.

>     The point here is that we don't need language changes or declarations
> to make Python much faster.  All we need are a few restrictions that
> insure that, when you're doing something unusual, the compiler can
> tell.

Agreed. And I don't buy into the negative "lesser Python" labelling of
such work, either. People seem to have forgotten how to use older,
conservative Python features, preferring to show off with metaclasses
and closures even for problems that could be solved using simple
classes and objects. If that's "greater Python" then call me a
minimalist!

Paul