[issue2792] alternate fast builtin sum, rev'd
Raymond Hettinger
report at bugs.python.org
Fri May 9 06:42:02 CEST 2008
Raymond Hettinger <rhettinger at users.sourceforge.net> added the comment:
Sorry, I really do not want to change the existing code. It's clear,
does exactly what it's supposed to do, is easily checked for
correctness, been seen and exercised in the alphas for a good while.
It gives a good tenfold speed-up (YMMV depending on machine and
compiler). Also, it is setup in a way that will make it easy to add
special handling of LongLongs.
I spend several days on the existing code to make sure that the
conversions and accumulations exactly matches what sum() was already
doing behind the scenes. Then, a good deal of time was spent making
before/after timings of various cases and comparing the result to psyco
generated code. I'm reluctant to throw-away those efforts and then
invest the same time with another patch that cannot produce any
signficant speedups.
The submitted code is longer and is harder for me to verify that it is
correct -- I've spent hours with it already and am not confident about
the code. ISTM, that the code is twisting itself in knots just to
avoid a single call to PyNumber_Add in an uncommon case. I find the
coding style uncomfortable and do not readily see how to extend it to
the LongLong case.
The existing code follows are series of required steps to assume a type
and continuously verify that assumption. Those steps which comprise
the bulk of the work are mandatory. So, all you can do with alternate
patches is rearrange the loops and in-inlining for a microscopic speed-
up at best. There is no new approach here that is worth the time spent
reviewing and re-reviewing patches. Unless you can get meaningful
speed-ups in the common cases, please stop re-arranging this code.
Also, the submission should have been accompanied by a full set of
before/after timings across a variety of use cases including short
lists, summing all ints, summing all floats, summing a random mix of
ints and floats, all longs, etc. And, it would have been nice to have
a conceptual statement of what the code purports to do differently --
what you think makes it better.
Sorry. I know you're having fun with this. But micro-tweaks at this
point are a total waste of time. It takes too much effort to verify
correctness, run all the timings, and create code that is
maintainable/extendable. Try to be content with the tenfold speedup
we've already gotten.
A better use of time would be to scan the code base for other places
that would benefit from the pattern of assuming a type, verifying the
assumption, running type specific code, and falling back if the
assumption fails. Focus the effort of common use cases (like for sum()
where the effort focused on all ints or all floats and a mixed-case.
----------
resolution: -> rejected
status: open -> closed
__________________________________
Tracker <report at bugs.python.org>
<http://bugs.python.org/issue2792>
__________________________________
More information about the Python-bugs-list
mailing list