[Patches] [ python-Patches-1700288 ] Armin's method cache optimization updated for Python 2.6

Sun Jun 10 04:08:35 CEST 2007

Patches item #1700288, was opened at 2007-04-13 15:16
Message generated for change (Comment added) made by bioinformed
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1700288&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Performance
Group: Python 2.6
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Kevin Jacobs (bioinformed)
Assigned to: Raymond Hettinger (rhettinger)
Summary: Armin's method cache optimization updated for Python 2.6

Initial Comment:
I've forward ported and slightly cleaned up Armin's method cache patch (see #1685986).  I've attempted to clarify and tighten up several loose ends in the code, so hopefully I haven't mucked anything up.

My performance results are not quite as good as Armin's, though still very encouraging.  I see a typical speed up of 10%, with heavily object oriented application code seeing speedups of 15-20%.  Given these rather significant results, the major task is to verify correctness.

----------------------------------------------------------------------

>Comment By: Kevin Jacobs (bioinformed)
Date: 2007-06-09 22:08

Message:
Logged In: YES 
user_id=1768698
Originator: YES

First off, all versions of this patch do away with the rather aggressively
repeated inline code.  My previous comment about refactoring and testing an
inlined form were purely an experiment with null results.

That aside, you do raise a good question.  However, given the current
patch, it is unfortunately off-topic and irrelevant to the consideration of
this patch.  Please feel free to pursue it elsewhere, since I worry that it
will only serve as a negative distraction from the much more interesting
aims of the mro optimization.  Since you are clearly worried about the
performance of attribute lookup, please try it out and report your
findings.  I'll be happy review the results from your benchmarks and any
suggestions you have.

----------------------------------------------------------------------

Comment By: Eyal Lotem (peaker)
Date: 2007-06-09 21:16

Message:
Logged In: YES 
user_id=231480
Originator: NO

Why is CPython copying functions to inline them?

This is not only code duplication that makes the code easy to break, it
also bloats the caller function beyond readability.

I believe that using a private .c #include file with an inline keyword, or
at worst, #include the function code directly is a better solution.

Is there any reason to use the method currently used by CPython?

----------------------------------------------------------------------

Comment By: Kevin Jacobs (bioinformed)
Date: 2007-05-14 16:18

Message:
Logged In: YES 
user_id=1768698
Originator: YES

I tried re-inlining the fast path from _PyType_Lookup() in object.c and
found no measurable improvement on the simple benchmarks I tried.  I've
also stress-tested the patch by disabling the fast-path return, always
performing the slow-path lookup, and asserting that the cached result
matches the slow-path result.  I then ran that modified interpreter on the
Python test-suite, various benchmarks, and a range of my own applications. 
While not a formal proof of correctness, it was encouraging that the cache
remained consistent.

----------------------------------------------------------------------

Comment By: Armin Rigo (arigo)
Date: 2007-05-14 15:12

Message:
Logged In: YES 
user_id=4771
Originator: NO

A further minor improvement could be achieved by
re-doing the inlining of _PyType_Lookup() in
object.c.  Ideally, the fast path only could be
inlined, and the rest put in a separate function
or just using _PyType_Lookup().

----------------------------------------------------------------------

Comment By: Kevin Jacobs (bioinformed)
Date: 2007-04-14 09:49

Message:
Logged In: YES 
user_id=1768698
Originator: YES

I benchmarked using pystone, pybench, and a bunch of my local scientific
applications that have tight computational kernels still in pure Python.  I
tested on a 64bit Linux box, defining the version_tag as either int or
unsigned long with no appreciable difference in performance.  I'm trying to
get parrotbench to run, but it doesn't seem to have been updated for modern
Python versions.

----------------------------------------------------------------------

Comment By: Neal Norwitz (nnorwitz)
Date: 2007-04-14 02:21

Message:
Logged In: YES 
user_id=33168
Originator: NO

I tried this test on parrotbench (b0.py in particular) and I'm not sure I
could distinguish an improvement over the noise (best case was about 5%). 
The variability is pretty high on my box (dual amd64 opertons, ubuntu gcc
4.1.2-ish).  At first it seemed a bit slower with the original patch which
uses unsigned ints.  I tried changing to unsigned longs.  It seemed a
little better, though not sure if it was really due to the change.  I think
unsigned longs should be used for 64-bit boxes.

Did you use a benchmark/program that is open source?  I don't know that I
have anything decent to test this with.  Raymond probably does though. 
Also what platform did you test on?

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1700288&group_id=5470