[Patches] [ python-Patches-1700288 ] Armin's method cache optimization updated for Python 2.6

Sun Jun 10 03:16:43 CEST 2007

Patches item #1700288, was opened at 2007-04-13 22:16
Message generated for change (Comment added) made by peaker
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1700288&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Performance
Group: Python 2.6
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Kevin Jacobs (bioinformed)
Assigned to: Raymond Hettinger (rhettinger)
Summary: Armin's method cache optimization updated for Python 2.6

Initial Comment:
I've forward ported and slightly cleaned up Armin's method cache patch (see #1685986).  I've attempted to clarify and tighten up several loose ends in the code, so hopefully I haven't mucked anything up.

My performance results are not quite as good as Armin's, though still very encouraging.  I see a typical speed up of 10%, with heavily object oriented application code seeing speedups of 15-20%.  Given these rather significant results, the major task is to verify correctness.

----------------------------------------------------------------------

Comment By: Eyal Lotem (peaker)
Date: 2007-06-10 04:16

Message:
Logged In: YES 
user_id=231480
Originator: NO

Why is CPython copying functions to inline them?

This is not only code duplication that makes the code easy to break, it
also bloats the caller function beyond readability.

I believe that using a private .c #include file with an inline keyword, or
at worst, #include the function code directly is a better solution.

Is there any reason to use the method currently used by CPython?

----------------------------------------------------------------------

Comment By: Kevin Jacobs (bioinformed)
Date: 2007-05-14 23:18

Message:
Logged In: YES 
user_id=1768698
Originator: YES

I tried re-inlining the fast path from _PyType_Lookup() in object.c and
found no measurable improvement on the simple benchmarks I tried.  I've
also stress-tested the patch by disabling the fast-path return, always
performing the slow-path lookup, and asserting that the cached result
matches the slow-path result.  I then ran that modified interpreter on the
Python test-suite, various benchmarks, and a range of my own applications. 
While not a formal proof of correctness, it was encouraging that the cache
remained consistent.

----------------------------------------------------------------------

Comment By: Armin Rigo (arigo)
Date: 2007-05-14 22:12

Message:
Logged In: YES 
user_id=4771
Originator: NO

A further minor improvement could be achieved by
re-doing the inlining of _PyType_Lookup() in
object.c.  Ideally, the fast path only could be
inlined, and the rest put in a separate function
or just using _PyType_Lookup().

----------------------------------------------------------------------

Comment By: Kevin Jacobs (bioinformed)
Date: 2007-04-14 16:49

Message:
Logged In: YES 
user_id=1768698
Originator: YES

I benchmarked using pystone, pybench, and a bunch of my local scientific
applications that have tight computational kernels still in pure Python.  I
tested on a 64bit Linux box, defining the version_tag as either int or
unsigned long with no appreciable difference in performance.  I'm trying to
get parrotbench to run, but it doesn't seem to have been updated for modern
Python versions.

----------------------------------------------------------------------

Comment By: Neal Norwitz (nnorwitz)
Date: 2007-04-14 09:21

Message:
Logged In: YES 
user_id=33168
Originator: NO

I tried this test on parrotbench (b0.py in particular) and I'm not sure I
could distinguish an improvement over the noise (best case was about 5%). 
The variability is pretty high on my box (dual amd64 opertons, ubuntu gcc
4.1.2-ish).  At first it seemed a bit slower with the original patch which
uses unsigned ints.  I tried changing to unsigned longs.  It seemed a
little better, though not sure if it was really due to the change.  I think
unsigned longs should be used for 64-bit boxes.

Did you use a benchmark/program that is open source?  I don't know that I
have anything decent to test this with.  Raymond probably does though. 
Also what platform did you test on?

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1700288&group_id=5470