[Python-Dev] 2.0 Optimization & speed

M.-A. Lemburg mal@lemburg.com
Fri, 08 Sep 2000 18:49:58 +0200


Vladimir Marangozov wrote:
> 
> Continuing my impressions on the user's feedback to date: Donn Cave
> & MAL are at least two voices I've heard about an overall slowdown
> of the 2.0b1 release compared to 1.5.2. Frankly, I have no idea where
> this slowdown comes from and I believe that we have only vague guesses
> about the possible causes: unicode database, more opcodes in ceval, etc.
> 
> I wonder whether we are in a position to try improving Python's
> performance with some `wise quickies' in a next beta.

I don't think it's worth trying to optimize anything in the
beta series: optimizations need to be well tested and therefore
should go into 2.1.

Perhaps we ought to make these optimizations the big new issue
for 2.1...

It would fit well with the move to a more pluggable interpreter
design.

> But this raises
> a more fundamental question on what is our margin for manoeuvres at this
> point. This in turn implies that we need some classification of the
> proposed optimizations to date.
> 
> Perhaps it would be good to create a dedicated Web page for this, but
> in the meantime, let's try to build a list/table of the ideas that have
> been proposed so far. This would be useful anyway, and the list would be
> filled as time goes.
> 
> Trying to push this initiative one step further, here's a very rough start
> on the top of my head:
> 
> Category 1: Algorithmic Changes
> 
> These are the most promising, since they don't relate to pure technicalities
> but imply potential improvements with some evidence.
> I'd put in this category:
> 
> - the dynamic dictionary/string specialization by Fred Drake
>   (this is already in). Can this be applied in other areas? If so, where?
>
> - the Python-specific mallocs. Actually, I'm pretty sure that a lot of
>   `overhead' is due to the standard mallocs which happen to be expensive
>   for Python in both space and time. Python is very malloc-intensive.
>   The only reason I've postponed my obmalloc patch is that I still haven't
>   provided an interface which allows evaluating it's impact on the
>   mem size consumption. It gives noticeable speedup on all machines, so
>   it accounts as a good candidate w.r.t. performance.
> 
> - ??? (maybe some parts of MAL's optimizations could go here)

One addition would be my small dict patch: the dictionary
tables for small dictionaries are added to the dictionary
object itself rather than allocating a separate buffer.
This is useful for small dictionaries (8-16 entries) and
causes a speedup due to the fact that most instance dictionaries
are in fact of that size.
 
> Category 2: Technical / Code optimizations
> 
> This category includes all (more or less) controversial proposals, like
> 
> - my latest lookdict optimizations (a typical controversial `quickie')
> 
> - opcode folding & reordering. Actually, I'm unclear on why Guido
>   postponed the reordering idea; it has received positive feedback
>   and all theoretical reasoning and practical experiments showed that
>   this "could" help, although without any guarantees. Nobody reported
>   slowdowns, though. This is typically a change without real dangers.

Rather than folding opcodes, I'd suggest breaking the huge
switch in two or three parts so that the most commonly used
opcodes fit nicely into the CPU cache.
 
> - kill the async / pending calls logic. (Tim, what happened with this
>   proposal?)

In my patched version of 1.5 I have moved this logic into the
second part of the ceval switch: as a result, signals are only
queried if a less common opcode is used.

> - compact the unicodedata database, which is expected to reduce the
>   mem footprint, maybe improve startup time, etc. (ongoing)

This was postponed to 2.1. It doesn't have any impact on
performance... not even on memory footprint since it is only
loaded on demand by the OS.
 
> - proposal about optimizing the "file hits" on startup.

A major startup speedup can be had by using a smarter
file lookup mechanism. 

Another possibility is freeze()ing the whole standard lib 
and putting it into a shared module. I'm not sure how well
this works with packages, but it did work very well for
1.5.2 (see the mxCGIPython project).
 
> - others?
> 
> If there are potential `wise quickies', meybe it's good to refresh
> them now and experiment a bit more before the final release?

No, let's leave this for 2.1.

-- 
Marc-Andre Lemburg
______________________________________________________________________
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/