[Python-Dev] Optimization targets

Michael Hudson mwh at python.net
Thu Apr 15 07:05:37 EDT 2004


Mike Pall <mikepy-0404 at mike.de> writes:

> Hi,
>
> Raymond wrote:
>> It looks like the best bet is to try to speedup the code without
>> changing the multiplier.
>
> Indeed. And while you are at it, there are other optimizations, that
> seem more promising:
>
> I compiled a recent CVS Python with profiling and here is a list of the
> top CPU hogs (on a Pentium III, your mileage may vary):

I played this game recently, but on a G3 ibook, which is probably a
much more boring processor from a scheduling point of view.

> pystone:
>
>   CPU%  Function Name
> ----------------------------
>  55.44  eval_frame
>   7.30  lookdict_string
>   4.34  PyFrame_New
>   3.73  frame_dealloc
>   1.73  vgetargs1
>   1.65  PyDict_SetItem
>   1.42  string_richcompare
>   1.15  PyObject_GC_UnTrack
>   1.11  PyObject_RichCompare
>   1.08  PyInt_FromLong
>   1.08  tupledealloc
>   1.04  insertdict

I saw similar results to this, tho' I don't remember lookdict_string
being so high on the list.

> parrotbench:
>
>   CPU%  Function Name
> ----------------------------
>  23.65  eval_frame
>   8.68  l_divmod
>   4.43  lookdict_string
>   2.95  k_mul
>   2.27  PyType_IsSubtype
>   2.23  PyObject_Malloc
>   2.09  x_add
>   2.05  PyObject_Free
>   2.05  tupledealloc
>
> Arguably parrotbench is a bit unrepresentative here. And beware: due to
> automatic inlining of static functions the real offender may be hidden
> (x_divmod is the hog, not l_divmod).

Probably a fine candidate function for rewriting in assembly too...

> Anyway, this just confirms that the most important optimization targets are:
> 1. eval_frame
> 2. string keyed dictionaries
> 3. frame handling

[...]

> But GCC has more to offer: read the man page entries for -fprofile-arcs
> and -fbranch-probabilities. Here is a short recipe:

I tried this on the ibook and I found that it made a small difference
*on the program you ran to generate the profile data* (e.g. pystone),
but made naff all difference for something else.  I can well believe
that it makes more difference on a P4 or G5.

[snippety]

> I bet there are more, but I'm running out of time right now --
> sorry.

I certainly don't want to discourage people from optimizing Python's
current implementation, but...

Some months ago (just after I played with -fprofile-arcs, most likely)
I wrote a rant about improving Python's performance, which I've
finally got around to uploading:

    http://starship.python.net/crew/mwh/hacks/speeding-python.html

Tell me what you think!

Cheers,
mwh

-- 
  Hmmm... its Sunday afternoon: I could do my work, or I could do a
  Fourier analysis of my computer's fan noise.
       -- Amit Muthu, ucam.chat (from Owen Dunn's summary of the year)



More information about the Python-Dev mailing list