Homogeneous Sequences in Python?

Wed Dec 27 00:46:56 EST 2000

cbarker at jps.net:

:
> The classic answer to why Python cannot be easily made faster, even if 
> it were compiled, is that because of Python's dynamic nature, an 
> expression like "x = a + b" cannot be compiled, because it is not known 
> until runtime what types a and b are, and what it means to __add__ them. 
> I am a BIG fan of Python's dynamic nature, so I would not like to
> see any of that flexibility go away.

The "holy grail" here would be to have automagic static type
inference, I think. This has been done for Scheme, for example,
without losing any dynamic flexibility.

> I have seen proposals for "optional static typing" that would address 
> this issue (I believe Guido is talking about it for Py3k). That would 
> certainly help, but it could be somewhat ugly as well. What occurs to me
> are two observations:

I hope that these proposals are not intended to solve the speed
problem!

> 1) the occasional staticly typed integer is not going to make much 
> difference. What would make a difference is a large collection of
> integers, for example

It could. Consider an integer that is incremented inside nested
loops. If it gets above 100, Python will end up allocating and
initializing new integer objects to increment it.

> 2) More often than not, when I build a large sequence, it is 
> homogeneous, that is a large list (or tuple, whatever) of the same type:
> maybe a list of strings ( file.readlines() ), a list of integers
> (range(n) ), or even a list of lists.
> 
> So, what we need to speed things up is to have a way for Python to 
> process large sequences of the same type without having to type check
> each item in the sequence.
>
> The simple way to implement this is to have an optionally static type 
> declaration for sequence: list.is_list_of_integers, or something like 
> that. Then when this list is processed, the interpreter would know what 
> it was dealing with.

But the interpreter already knows what it's dealing with. All it has
to do is to check its ob_type object! If the ob_type of a variable
is known to be constant at _compile_ time (say, if you do a for loop
over a statically typed list of integers), then it's possible to
move the vtable-style lookup and possibly some sanity checks out of
the loop (of course, no bytecode instructions exist to facilitate
this)...

> It then struck me that it could certainly be done 
> automatically, on-the-fly: Every time an item was added to a sequence, 
> it's type would be checked, and some kind of "homogeneous
> flag" would be set:

..but if you don't know anything about the type at compile time,
you have to generate code for the general case -- which uses the
ordinary bytecodes which do the method lookup and type checks as
usual. Of course, if you can't infer it absolutely (what if
someone goes to the interpreter prompt and makes a list of class
Foo), you can compile two versions of the function -- a generic
list, and a list-of-ints, say -- assuming you know what types it
will likely be called with.

What I think I hear you proposing is to have objects be able to
track their own type beyond the single layer that is indicated in
the ob_type field. Then if we know how to make optimized functions
for certain cases, we can dispatch based on this "rich type":

> a = [] # type unknown

__getitem__: raise IndexError

> a.append(4) # homogeneous list of integers

__getitem__: returns object of IntegerType or raise IndexError

> a.append(56) # still a homogeneous list of integers

__getitem__: returns object of IntegerType or raise IndexError

> a.append(4.5) # Not homogeneous

__getitem__: returns object of NumericProtocolType or 
raise IndexError

(I'm assuming the Py3K-ish type/class consolidation here -- none
of this is "real".)

If we have a compiled version of a function that is optimized
for the case where its argument's __getitem__ method returns a
specific type, we can dynamically dispatch to that version
instead of the generic one. From that point on, we can avoid all
dynamic dispatching.

> I have not idea how much this would slow append operations, but I 
> suspect much less than the potential speed up of know that a list is 
> homogeneous when it is. A plus of doing this automatically, is that it 
> would take NO changes to existing code to take advantage of the new
> feature.

I agree that we should look at optimizations that will work without
changes to existing code. Actually, here we may be able to work the
best of both worlds. Think of the extended type tracking as kind of
a "profiler" which collects statistical type information based on
what types are encountered in the real world. Then, this data can
be fed back into the compiler which optimizes for those static types.
Once the optimized versions are entered, the extended type codes
(actually, any type codes) are no longer needed in the wonderful
world of static types, which continues from that point onwards
(call-graphically speaking.)

The profiler could even run transparently as the program runs,
generating the fast VM code (or maybe even machine code) in real
time and keeping tracking of itself in a pickle for successive runs.
It could try to find that magic threshold where Pythoners would
naturally split up their program into Python (dynamic dispatch based
on rich type) and C (statically typed) modules. Now that would be
crazy.

The statistical, "real life" approach could possibly produce better
optimizations than a perfect static type inferencer could (like I
mentioned before, the existence of the interactive interpreter
nullifies any possible proof of what kind of arguments a function
must receive.)

> Would this really speed up Python code much, even for large homogeneous 
> sequences? I have no idea. I am in no way qualified to make that
> judgment, I'd like to hear what folks think.

It depends a lot on how you divide up the run-time and compile-time
layers, and how they're going to interact.

> I suspect that it won't speed up straight, run-through-the-interpreter 
> code enough to make it worth it, but it could have some real benefits in
> three places:
> 
> 1) some of the sequence oriented operations that are being introduced:
> list comprehensions, and the possible future Elementwise/Objectwise
> Operators (see PEP # 225)
> 
> 2) Py2c, the Python to C translator. I think a really important element 
> of Python's future may be Py2C (and perhaps eventually a Python compiler 
> or JIT compiler). It seems that has been the direction of number of 
> interpreted languages. With optional static typing and compilation, LISP 
> can be as fast (or faster) that C++ for numerics. MATLAB now has a 
> translator to C++ that can create faster code and fully compiled stand 
> alone applications. That has almost eliminated the need to write 
> external functions in C. Py2C can only be so fast without some kind of 
> static typing.

Or type inference. It surprises me to hear you say this, because you
seemed to be making a case for type inference with the list-of-integers
argument. Or maybe I don't understand what you're getting at?

> I think homogeneous sequences could do it. Right now Py2C 
> can convert "for i in range(1000)" into a fast C loop, but can't do 
> anything with "for i in list_of_integers", If it had a way of knowing 
> that it was a list of integers, it could be highly optimized.

Right, but Py2C would need to know it, not just Python. That's
the challenge ;-)

> 3) Extensions that you have to pass a lot of data to. An example is 
> wxPython. In order to draw a polyline with a couple of thousand points, 
> for example, you have to pass a list of 2-tuples of integers of the 
> coordinates of the points. The extension has to do a type check on every 
> one of the numbers in every one of the tuples in the list. Frankly, this 
> is still pretty fast, but not as fast as it could be.

External functions can, of course, provide known type signatures which
can greatly assist any type inference engine. Then again, perhaps it
is best not to rely too much on C function's demands: I often do a
PyNumber_Int, followed by a PyInteger_AsLong to accept any kind of
number object as input. It's kind of nice how Python does automatic
conversions like that.

> My goal is to be able to write applications that contain some 
> numerically intensive code (or other repetitious simple processing) and 
> not to have to use anything but Python. I'm going to have to keep 
> writing C for a large amount of the core code if something isn't done a 
> little differently.

This is, I think, the "Python approach" for problems like this:
use a C module for the time-intensive stuff, use Python for the
"high-level" control. Could this be done automatically? I think
so.

> What about Numeric?
> 
> NumPy provides the Multiarray object, which does provide what I am 
> talking about, a homogeneous sequence ( at least of numbers, which is 
> mostly what I need). As a result, it can implement very fast array 
> oriented operations that eliminate the need for a large amount of hand 
> coded C extensions. The problem is that core Python has no idea that it 
> is a homogeneous sequence, so when you have to do some looping, things
> slow WAY down.

The critical point here is, I think, who's getting that info: core Python
is only the first part of the battle. The code that Python is running
has to know too.

--