[Python-Dev] FAT Python (lack of) performance

Mon Jan 25 13:16:29 EST 2016

Hi,

Summary: FAT Python is not faster, but it will be ;-)

--

When I started the FAT Python as a fork of CPython 3.6, I put
everything in the same repository. Last weeks, I focused on splitting
my giant patch (10k lines) into small reviewable patches. I wrote 3
PEP (509 dict version, 510 function specialziation, 511 code
tranformers) and I enhanced the API to make it usable for more use
cases than just FAT Python. I also created fatoptimizer (the AST
optimizer) and fat (runtime dependency of the optimizer) projects on
GitHub to separate clearly what should be outside Python core. For all
links, see:

   http://faster-cpython.readthedocs.org/fat_python.html

For the fatoptimizer project, my constraint is to be able to run the
full Python test suite unmodified. In practice, I have to disable some
optimizations by putting a "__fatoptimizer__= {...}" configuration to
some test files. For example, I have to disable constant folding on
test_bool because it tests that False+2 gives 2 at runtime, whereas
the optimizer replaces directly False+2 with 2 during the compilation.
Well, test_bool.py is not the best example because all tests pass with
the constant folding optimization (if I comment my
"__fatoptimizer__={...}" change).

This constraint ensures that the optimizer "works" and doesn't break
(too much ;-)) the Python semantics, but it's more difficult to
implement powerful optimizations.

I also found and fixed various kinds of bugs. In my code obviously,
but also in the Python core, in various places. Some bugs only concern
AST transformers which is a new feature, but I had to fix them. For
example, Python didn't support negative line number delta in
co_lntotab of code objects, and so line number were all wrong on
optimized code. I merged my enhancement in the default branch of
CPython (issue #26107).

In short, I focused on having something working (respecting the Python
semantics), rather than spending time on writing optimizations.

--

When I asked explicitly "Is someone opposed to this PEP 509 [dict
verion] ?", Barry Warsaw answered that a performance analysis is
required. Extract of his mail:

   "I still think this is maintenance and potential performance
overhead we don't want to commit to long term unless it enables
significant optimization.  Since you probably can't prove that without
some experimentation, this API should be provisional."

Last week, I ran some benchmarks and I have to admin that I was
disappointed. Not only fatoptimizer doesn't make Python faster, but it
makes it much slower on some tests!

   http://fatoptimizer.readthedocs.org/en/latest/benchmarks.html

Quickly, I identified a major performance issue when nested functions
are specialized, especially in Lib/json/encoder.py (tested by
bm_json_v2.py benchmark). I fixed my optimizer to not specialize
nested functions anymore. This simple change fixed the main
performance issue. Reminder: in performance critical code, don't use
nested functions! I will maybe propose patches for Lib/json/encoder.py
to stop using nested functions.

I only ran benchmarks with the optimizer enabled. I now have to
measure the overhead of my patches (PEP 509, 510 and 511) adding the
API fat AST optimizers. The overhead must be negligible. For me, it's
a requirement of the whole project. Changes must not make Python
slower when the optimizer is not used.

fatoptimizer is faster on microbenchmarks, but I had to write manually
some optimizations:

   http://fatoptimizer.readthedocs.org/en/latest/microbenchmarks.html

IMHO fatoptimizer is not faster on macro benchmarks because it is not
smart enough (yet) to generate the most interesting optimizations,
like function inlining and specialization for argument types. You can
estimate the speedup if you specialize manually your functions.

--

Barry also wrote: "Did you address my suggestion on python-ideas to
make the new C API optionally compiled in?"

Well, it is an option, but I would prefer to have the API for AST
optimizer directly built in Python.

The first beta version of Python 3.6 is scheduled in September 2016
(deadline for new features in Python 3.6), so I still have a few
months to implement more powerful optimizations and prove that it can
be faster ;-)

Victor