[Python-Dev] Third milestone of FAT Python

Victor Stinner victor.stinner at gmail.com
Fri Dec 4 07:49:24 EST 2015


Hi,

I implemented 3 new optimizations in FAT Python: loop unrolling, constant
folding and copy builtin functions to constants. In the previous thread,
Terry Reedy asked me if the test suite is complete enough to ensure that
FAT Python doesn't break the Python semantic. I can now say that the
second milestone didn't optimize enough code to see bugs, new
optimizations helped me to find a *lot* of bugs which are now all fixed.

The full Python test suite pass with all optimizations enabled. Only two
tests are skipped in FAT Python: test_dynamic and mock tests of
test_unittest. test_dynamic checks that it's possible to replace builtin
functions in a function and then use the replaced builtins from the same
function. FAT Python currently doesn't support the specific case of this
test. The mock tests of test_unittest does something similar, I'm more
concerned by these failures.

This email is an updated version on my previous blog article:
https://haypo.github.io/fat-python-status-nov26-2015.html

Since I wrote this blog article, I implemented the constant folding
optimization and I fixed the two major bugs mentioned in the article
(line number and exec(code, dict)).


Documentation
=============

I combined the documentation of (my) various optimizations projects into
a single documentation:
http://faster-cpython.readthedocs.org/

The FAT Python project has its own page:
http://faster-cpython.readthedocs.org/fat_python.html


Constant folding
================

This optimization propagates constant values of variables. Example:

    def func()
        x = 1
        y = x
        return y

Constant folding:

    def func()
        x = 1
        y = 1
        return 1

This optimization alone is not really exciting. It will more used later
when the optimizer will implement peephole optimizations (ex: a+b) and
remove dead code. For example, it will be possible to remove code
specific to a platform
(ex: 'if sys.platform.startswith("freebsd"): ...').

Later, removal of unused local variables will be implemented to simplify
the code even more. The previous example will be simplified to:

    def func():
        return 1


Loop unrolling optimization
===========================

The optimization generates assignement statements (for the loop index
variable) and duplicates the loop body to reduce the cost of loops. Example:

    def func():
        for i in range(2):
            print(i)

Loop unrolled:

    def func():
        i = 0
        print(i)

        i = 1
        print(i)

If the iterator uses the builtin range function, two guards are
required on builtins and globals namespaces.

The optimization also handles tuple iterator
(ex: "for i in (1, 2, 3): ..."). No guard is needed in this case (the
code is always optimized).

Loop unrolling combines well with constant folding. The previous example
is simplified to:

    def func():
        i = 0
        print(0)

        i = 1
        print(1)

Again, with a future removal of unused local variables optimization, the
previous example will be simplified to:

    def func():
        print(0)
        print(1)


Copy builtins to constants optimization
=======================================

This optimization is currently disabled by default. (Well, in practice,
it's enabled by the site module to test it and detect code which doesn't
work with it.)

The LOAD_GLOBAL instruction is used to load a builtin function. The
instruction requires two dictionary lookup: one in the globals namespace
(which almost always fail) and then in the builtins namespace.

It's rare to replace builtins, so the idea here is to replace the dynamic
LOAD_GLOBAL instruction with a static LOAD_CONST instruction which
loads the function from a C array, a fast O(1) access.

It is not possible to inject a builtin function during the compilation.
Python code objects are serialized by the marshal module which only
support simple types like integers, strings and tuples, not functions.
The trick is to modify the constants at runtime when the module is
loaded. I added a new patch_constants() method to functions.

Example:

    def log(message):
        print(message)

This function is specialized to:

    def log(message):
        'LOAD_GLOBAL print'(message)
    log.patch_constants({'LOAD_GLOBAL print': print})

The specialized bytecode uses two guards on builtins and globals
namespaces to disable the optimization if the builtin function is
replaced.

This optimization doesn't support the case when builtins are modified
while the function is executed. I think that it will be safer to disable
the optimization by default.

Later, we can enhance the optimization to enable it if the function
cannot modify builtins and if it only calls funcions which cannot modify
builtins. I bet that the optimization will be safe with these additional checks.


Changes on builtin guards
=========================

When a guard is used on a builtin function, the specialization of the
function is now ignored if the builtin was replaced or if a function
with the name already exists in the globals namespace.

At the end of the Python finalization (after the site module is
imported), the fat module keeps a private copy of builtins. When a
builtin guard is used, the current builtin function is simply compared
to the old copy old builtins.

The assumption here is that builtin functions are not replaced during
Python initialization.

By the way, I started to document FAT PYthon limitations and effects
on the Python semantic:
http://faster-cpython.readthedocs.org/fat_python.html#limitations-and-python-semantic


Lot of enhancements of the AST optimizer
========================================

New optimizations helped to find bugs in the AST optimizer. Many fixes
and various enhancements were done in the AST optimizer.

The optimizer was optimized, copy.deepcopy() is no more used to
duplicate a full tree.  The new NodeTransformer class only duplicates
modified nodes.

The optimizer understands now much better Python namespaces (globals,
locals, non locals, etc.).

It is now able to optimize a function without guards: it's used to
unroll a loop using a tuple as iterator.


Versionned dictionary
=====================

In the previous milestone of FAT Python, the versionned dictionary was a
new type inherited from the builtin dict type which added a __version__
read-only (global "version" of dictionary, incremented at each change),
a getversion(key) method (version of a one key) and it added support for
weak references.

I done my best to make FAT Python changes optional to leave CPython
completly unchanged to not hit performances when the FAT mode is not
used. But I had two major technical issues.

The first one is that using a different structure for dictionary entries
would make the dict code more complex and maybe even slower (which is
not acceptable).

The second one is that it was no more possible to call exec(code,
globals, locals) in FAT mode where globals or locals were a dict. The
code needed to be modified using something like:
   globals = fat.verdict() if __fat__ or {}
It required to import the fat module and modify all code calling exec().

I removed the fat.verdict type and added the __version__ property to the
builtin dict type. It's incremented at each change. The getversion()
method and the support for weak reference is removed. Python has already
special code to handle reference cycles of dictionaries, there is no
need to support weak references. Guards now use strong references to namespaces.

Victor


More information about the Python-Dev mailing list