Python optimization (was Python's "only one way to do it" philosophy isn't good?)

Mon Jun 11 03:27:35 EDT 2007

>    It's hard to optimize Python code well without global analysis.
> The problem is that you have to make sure that a long list of "wierd
> things", like modifying code or variables via getattr/setattr, aren't
> happening before doing significant optimizations.  Without that,
> you're doomed to a slow implementation like CPython.
> 
>    ShedSkin, which imposes some restrictions, is on the right track here.
> The __slots__ feature is useful but doesn't go far enough.
> 
>    I'd suggest defining "simpleobject" as the base class, instead of 
> "object",
> which would become a derived class of "simpleobject".   Objects descended
> directly from "simpleobject" would have the following restrictions:
> 
>     - "getattr" and "setattr" are not available (as with __slots__)
>     - All class member variables must be initialized in __init__, or
>       in functions called by __init__.  The effect is like __slots__,
>       but you don't have to explictly write declarations.
>     - Class members are implicitly typed with the type of the first
>       thing assigned to them.  This is the ShedSkin rule.  It might
>       be useful to allow assignments like
> 
>         self.str = None(string)
> 
>       to indicate that a slot holds strings, but currently has the null
>       string.
>     - Function members cannot be modified after declaration.  Subclassing
>       is fine, but replacing a function member via assignment is not.
>       This allows inlining of function calls to small functions, which
>       is a big win.
>     - Private function members (self._foo and self.__foo) really are
>       private and are not callable outside the class definition.
> 
> You get the idea.  This basically means that "simpleobject" objects have
> roughly the same restrictions as C++ objects, for which heavy compile time
> optimization is possible.  Most Python classes already qualify for
> "simpleobject".  And this approach doesn't require un-Pythonic stuff like
> declarations or extra "decorators".
> 
> With this, the heavy optimizations are possible.  Strength reduction.  
> Hoisting
> common subexpressious out of loops.  Hoisting reference count updates 
> out of
> loops.  Keeping frequently used variables in registers.  And elimination of
> many unnecessary dictionary lookups.

I won't give you the "prove it by doing it"-talk. It's to cheap.

Instead I'd like to say why I don't think that this will buy you much 
performance-wise: it's a local optimization only. All it can and will do 
is to optimize lookups and storage of attributes - either functions or 
values - and calls to methods from within one specialobject. As long as 
expressions stay in their own "soup", things might be ok.

The very moment you mix this with "regular", no-strings-attached python 
code, you have to have the full dynamic machinery in place + you need 
tons of guarding statements in the optimized code to prevent access 
violations.

So in the end, I seriously doubt the performance gains are noticable. 
Instead I'd rather take the pyrex-road, which can go even further 
optimizing with some more declarations. But then I at least know exactly 
where the boundaries are. As does the compiler.

> Python could get much, much faster.  Right now CPython is said to be 60X 
> slower
> than C.  It should be possible to get at least an order of magnitude over
> CPython.

Regardless of the possibility of speeding it up - why should one want 
this? Coding speed is more important than speed of coding in 90%+ of all 
cases. The other ones - well, if you _really_ want speed, assembler is 
the way to go. I'm serious about that. There is one famous mathematical 
library author that does code in assembler - because in the end, it's 
all about processor architecture and careful optimization for that. [1]

The same is true for e.g. the new Cell architecture, or the 
altivec-optimized code in photoshop that still beats the crap out of 
Intel processors on PPC-machines.

I'm all for making python faster if it doesn't suffer 
functionality-wise. But until there is a proof that something really 
speeds up python w/o crippling it, I'm more than skeptical.

Diez

[1] http://math-atlas.sourceforge.net/faq.html#auth

"""
  Kazushige Goto
     His ev5/ev6 GEMM is used directly by ATLAS if the user answers 
"yes" to its use during the configuration procedure on an alpha 
processor. This results in a significant speedup over ATLAS's own GEMM 
codes, and is the fastest ev5/ev6 implementation we are aware of.
"""