[Python-ideas] A General Outline for Just-in-Time Acceleration of Python

David Mertz mertz at gnosis.cx
Fri Jun 13 08:21:39 CEST 2014


There are many people with a better knowledge of PyPy than me; I've only
looked at it and read some general white papers from the team.  But I do
know with certainty that PyPy executes ALL Python code, not a restricted
subset[*].  You might be thinking of Cython in terms of an "annotated,
restricted, version of Python."

However, PyPy itself is *written* in a restricted subset called RPython,
which also might be what you are thinking of.  Well, much of PyPy is
written that way, I'm pretty sure some of it is regular unrestricted Python
too.  Other than the fact that PyPy isn't named, e.g. "PyRPy" this is just
an implementation detail though--Jython is written in Java, Iron Python is
written in C#, CPython is written in C, and PyPy is written in (R)Python.
 They all run user programs the same though[**]

[*] Yes, possibly there is some weird buggy corner case where it does the
wrong thing, but if so, file a ticket to get it fixed.

[**] For a suitably fuzzy meaning of "the same"--obviously performance
characteristics are going to differ, as are things like access to external
library calls, etc.  It's the same in terms of taking the
same identical source files to run, in any case.


On Thu, Jun 12, 2014 at 10:53 PM, Neil Girdhar <mistersheik at gmail.com>
wrote:

> Well that's great to hear :)   I thought pypy only worked on a restricted
> set of Python.  Does pypy save the optimization statistics between runs?
>
>
> On Fri, Jun 13, 2014 at 1:52 AM, David Mertz <mertz at gnosis.cx> wrote:
>
>> Other a sprinkling of the word "stochastic" around this post (why that
>> word, not the more obvious "random"?), this basically exactly describes
>> what PyPy does.
>>
>>
>> On Thu, Jun 12, 2014 at 9:07 PM, Neil Girdhar <mistersheik at gmail.com>
>> wrote:
>>
>>> I was wondering what work is being done on Python to make it faster.  I
>>> understand that cpython is incrementally improved.  I'm not sure, but I
>>> think that pypy acceleration works by compiling a restricted set of Python.
>>>  And I think I heard something about Guido working on a different model for
>>> accelerating Python.  I apologize in advance that I didn't look into these
>>> projects in a lot of detail.  My number one dream about computer languages
>>> is for me to be able to write in a language as easy as Python and have it
>>> run as quickly as if it were written.  I do believe that this is possible
>>> (since in theory someone could look at my Python code and port it to C++).
>>>
>>> Unfortunately, I don't have time to work on this goal, but I still
>>> wanted to get feedback about some ideas I have about reaching this goal.
>>>
>>> First, I don't think it's important for a "code block" (say, a small
>>> section of code with less coupling to statements outside the block than to
>>> within the block)   to run quickly on its first iteration.
>>>
>>> What I'm suggesting instead is for every iteration of a "code block",
>>> the runtime stochastically decides whether to collect statistics about that
>>> iteration.  Those statistics include the the time running the block, the
>>> time perform attribute accesses including type method lookups and so on.
>>>  Basically, the runtime is trying to guess the potential savings of
>>> optimizing this block.
>>>
>>> If the block is run many times and the potential savings are large, then
>>> stochastically again, the block is promoted to a second-level statistics
>>> collection.   This level collects statistics about all of the external
>>> couplings of the block, like the types and values of the passed-in and
>>> returned values.
>>>
>>> Using the second-level statistics, the runtime can now guess whether the
>>> block should be promoted to a third level whereby any consistencies are
>>> exploited.  For example, if the passed-in parameter types and return value
>>> type of the "min" function are (int, int, int) for 40% of the statistics
>>> and (float, float, float) for 59%, and other types for the remaining 1%,
>>> then two precompiled versions of min are generated: one for int and one for
>>> float.
>>>
>>> These precompiled code blocks have different costs than regular Python
>>> blocks.  They need to pay the following costs:
>>> * a check for the required invariants (parameter types above, but it
>>> could be parameter values, or other invariants)
>>> * they need to install hooks on objects that must remain invariant
>>> during the execution of the block; if the invariants are ever violated
>>> during the execution of the block, then all of the computations done during
>>> this execution of the block must be discarded
>>> * therefore a third cost is the probability of discarded the computation
>>> times the average cost of the doing the wasted computation.
>>>
>>> The saving is that the code block
>>> * can be transformed into a faster bytecode, which includes straight
>>> assembly instructions in some sections since types or values can now be
>>> assumed,
>>> * can use data structures that make type or access assumptions (for
>>> example a list that always contains ints can use a flattened
>>> representation; a large set that is repeatedly having membership checked
>>> with many negative results might benefit from an auxiliary bloom filter,
>>> etc.)
>>>
>>> In summary the runtime performs stochastic, incremental promotion of
>>> code blocks from first-level, to second-level, to multiple precompiled
>>> versions.  It can also demote a code block.  The difference between the
>>> costs of the different levels is statistically estimated.
>>>
>>> Examples of optimizations that can be commonly accomplished using such a
>>> system are:
>>> * global variables are folded into code as constants.  (Even if they
>>> change rarely, you pay the discarding penalty described above plus the
>>> recompilation cost; the benefit of inline use of the constant (and any
>>> constant folding) might outweigh these costs.)
>>> * lookup of member functions, which almost never change
>>> * flattening of homogeneously-typed lists
>>>
>>> Best,
>>>
>>> Neil
>>>
>>> _______________________________________________
>>> Python-ideas mailing list
>>> Python-ideas at python.org
>>> https://mail.python.org/mailman/listinfo/python-ideas
>>> Code of Conduct: http://python.org/psf/codeofconduct/
>>>
>>
>>
>>
>> --
>> Keeping medicines from the bloodstreams of the sick; food
>> from the bellies of the hungry; books from the hands of the
>> uneducated; technology from the underdeveloped; and putting
>> advocates of freedom in prisons.  Intellectual property is
>> to the 21st century what the slave trade was to the 16th.
>>
>
>


-- 
Keeping medicines from the bloodstreams of the sick; food
from the bellies of the hungry; books from the hands of the
uneducated; technology from the underdeveloped; and putting
advocates of freedom in prisons.  Intellectual property is
to the 21st century what the slave trade was to the 16th.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20140612/025d1483/attachment-0001.html>


More information about the Python-ideas mailing list