[Python-ideas] optimized VM ideas

Sat Jan 24 22:04:35 CET 2009

On Jan 24, 2009, at 1:53 AM, joe wrote:

> On Fri, Jan 23, 2009 at 2:36 AM, Leonardo Santagada <santagada at gmail.com 
> > wrote:
>>
>> I don't think TraceMonkey is slower than V8 I believe the last time  
>> I looked
>> TraceMonkey was faster than V8 and it is becoming even faster at each
>> interaction.
>>
>> The way TraceMonkey works reminds me a bit of Psyco, although I  
>> might be
>> mixing it with the PyPy JIT. But talking about Psyco, why people  
>> don't go
>> help Psyco if all they want is a JIT? It is not like the idea of  
>> having a
>> JIT on Python is even new... Psyco was optimizing code even before  
>> Webkit/V8
>> existed.
>>
>>
>>> This leads me to believe that relatively simple, more general  
>>> concepts in
>>> VM design can have a bigger impact then specific, highly  
>>> complicated JIT
>>> solutions, in the context of dynamic languages that can't be  
>>> easily typed
>>> at compile time.
>>
>> I believe in the exact oposite, and TraceMonkey is probably one of  
>> the
>> proofs of that...
>>
> Well, perhaps there's validity to both sides; V8 is still pretty fast.
> I didn't know they'd improved tracemonkey that much, interesting.
> Trace trees struck me as fairly different from psyco's design (it's
> been a while since I looked at the papers for both though); Psyco
> essentially was designed to optimized specific forms of code, while
> trace trees is more generalized.

Yes I confused the design of pypy jit with the psyco one, sorry.

>>> So I've thought of a few ideas for a more (new) streamlined python  
>>> VM:
>>>
>>> * Simplify the cpython object model as much as possible, while still
>>> allowing
>>> most of the power of the current model.
>>
>> This would modify the language, so it might be interesting, but would
>> generate something which is not Python.
>
> I don't think so.  I didn't mean modify the python object model, I
> meant use a modified version of cpython's implementation of it.  The
> cpython C API isn't part of the language standard, after all, and it's
> kindof inefficient and complex, imho.

I never messed much with the cpython implementation of it, but maybe.  
If I remember correctly the idea for it is to be reasonably simple so  
that it is easy to write extension modules and incorporate it with c  
programs.

>>
>>> * Either keep referencing counting, or experiment with some of the  
>>> newer
>>> techniques such as pointer escaping. Object models that  
>>> exclusively rely
>>> on cyclic GC's have many issues and are hard to get right.
>>
>> Don't know, but a good GC is way faster than what CPython is doing  
>> already,
>> but maybe it is a good idea to explore some others perspectives on  
>> this.
>>
>
> I disagree.  I dislike GCs as a generalized solution, you don't always
> need the overhead of a full GC.  Reference counting + a GC seems like
> a better compromise, since there's less work for it to do (or rather,
> the work is spread over time in smaller amounts).

A generational GC does exactly that, it spread the work of a  
collection around, not as fine grained as refcounting but maybe  
simpler to implement if in the end you also want to get rid of the  
GIL... but yes this is still a point that is not a decided matter...

>>> * Possibly modify the bytecode to be register-based, as in  
>>> SquirrelFish.
>>> Not sure if this is worth it with python code.
>>
>> Maybe it would help a bit. I don't think it would help more than  
>> 10% tops
>> (but I am completely guessing here)
>
> Ah, 10% sounds like it would be worth it, actually.  The simple code
> generation SquirrelFish does is interesting too, it essentially
> compiles the opcodes to native code.  The code for simpler opcodes
> (and simple execution paths in the more complex ones) are inlined in
> the native code stream, while more complex opcodes are called as
> functions.

There is a tool on pypy to do this also, you should look at it (I  
always forget the name, but it would be easy to find on pypy site).

>>> * Use direct threading (which is basically optimizing switch  
>>> statements to
>>> be only one or two instructions) for the bytecode loop.
>>
>> The problem with this is (besides the error someone has already  
>> stated about
>> your phrasing) that python has really complex bytecodes, so this  
>> would also
>> only gain around 10% and it only works with compilers that accept  
>> goto
>> labels which the MSVC for example does not (maybe there are more  
>> compilers
>> that also doesn't).
>
> Python's bytecode isn't all that complex, when I looked at it.  It's
> not that much worse then Squirrelfish's original bytecode
> specification (which I need to look at again, btw, not sure what
> they're doing now).  I was kindof surprised, thought it'd be much
> worse.

There was a discussion about this on pypy-dev only a week ago, you  
might have some fun looking at the archives.

>>> * Remove string lookups for member access entirely, and replaced  
>>> with a
>>> system of unique identifyers.  The idea is you would use a hash in  
>>> the
>>> types to map a member id to an index.  Hashing ints is faster then
>>> strings,
>>> and I've even thought about experimenting with using collapsed  
>>> arrays
>>> instead
>>> of hashes.  Of course, the design would still need to support string
>>> lookups
>>> when necessary.  I've thought about this a lot, and I think you'd  
>>> need
>>> the
>>> same general idea as V8's hidden classes for this to work right  
>>> (though
>>> instead of classes, it'd just be member/unique id lookup maps).
>>
>> A form of hidden classes is already part of PyPy (but I think that  
>> only the
>> jit does this). But you can simply remove string lookups as people  
>> can
>> implement special methods to track this on the current Python. As I  
>> said
>> before I don't believe changing the semantics of python for the  
>> sake of
>> performance is even possible.
>
> Obviously you'd have to go through the effort to *not* change the
> language semantics, which would mean still allowing things like
> __get/setattr__ and __get/setattribute__, a working __dict__, etc.

Thats one of the reason I think that a JIT is probably the only answer  
for performance in python. At any time you could appen any of the  
special methods to a class...

>>> I'm not sure I'll have the time to anytime soon to prototype these  
>>> ideas,
>>> but I
>>> thought I'd kick them out there and see what people say.  Note,  
>>> I'm in no
>>> way
>>> suggesting any sort of change to the existing cpython VM (it's  
>>> way, way
>>> too
>>> early for that kind of talk).
>>
>> If you are not talking about changing CPython VM why not look at  
>> Psyco and
>> PyPy? :)
>
> I looked at Psyco.  It didn't look like it had much further potential
> to me.  It only optimizes certain situations; it's not very
> generalized.  PyPy looks interesting though.

Psyco optimizes a lot of situations, the next version I gathered is  
going to optimize generator expressions, but yes, pypy I guess is the  
long term answer to performance and python.

ps1: I thought you just forgot to send this to the list so I am  
replying to it, hope you don't mind.

ps2: I'm not part of pypy core team or anything, so my view is just,  
well, my view.
--
Leonardo Santagada
santagada at gmail.com