Speed: bytecode vz C API calls

Tue Dec 9 23:50:06 EST 2003

Stuart Bishop wrote in message ...
>About the only thing that could be done then is:
>
>def memoize(callable):
>    cache = {}
>    def proxy(*args, _cache=cache):

Not legal.  *args must always come after default args. There's no way to do
this without changing the calling characteristics of the function, which the
OP has stated is vital.

It is probably possible to custom-build a code object which makes _cache a
local reference to the outer cache, and play with the bytecode to make the
LOAD_DEREF into a LOAD_FAST, but it's not worth the effort.  I don't know
how fast/slow a LOAD_DEREF is compared to a LOAD_FAST, though. In any case,
it won't help more than a few percent.

Jacek, I think this is as fast as pure Python will get you with this
approach.  If the speed is simply not acceptable, there are some things you
need to consider, Is this the *real* bottleneck?

Consider this before you delve into C solutions, which greatly increase
headaches.  An easy way to check is to cut out the map and memoization and
just use a dictionary preloaded with known argument:value pairs for a given
set of test data.

e.g.,

def func(...):
    ...

data = [foo, bar, ....]
cache = dict([(i, func(i)) for i in data])

def timetrial(_data=data, _cache=cache):
    return map(_cache.__getitem__, _data)
# [operator.getitem(_cache, i) for i in _data] *might* be faster.

then timeit.py -s"import ttrial" "ttrial.timetrial()"

If it's still too slow, then either Python simply isn't for this task, or
you need to optimize your functions.

If this isn't the real bottleneck:
- Perhaps the functions I'm memoizing need optimization themselves?
- Are there any modules out there which implement these in C already?
- How does the competing C++ code do it?  Does it get its speed from some
wicked data structure for the memoizing, or are the individual functions
faster, or what?

If this *is* the only significant bottleneck, but you want to try harder to
speed it up, you can pursue the C closure path you were mentioning.  With
the caveat that I don't do any CPython hacking, I know that closures are
implemented using co_cellvars and the LOAD_DEREF bytecode.  The Python C api
does describe an api to cell objects, so that will be the place to look, if
it's possible to do a closure in C.  It may not be faster than your
straightforward approach, of course.

However, comparing your Python to a C++ version on the basis of speed is
most certainly the wrong way to go.  If the C++ is well designed with good
algorithms, its going to be faster than Python. Period.  The advantage of
Python is that it's much faster and easier to *write* (and less likely to
introduce bugs), easier to *read* and maintain, and can model your problem
more intuitively and flexibly.  But if you need maximum speed at all costs,
the best Python can do for you is act as a glue language to bits and pieces
of hand-crafted C.

Do you really *need* the speed, or are you just bothered that it's not as
fast as the C++ version?
--
Francis Avila