[Python-Dev] Speeding up CPython 5-10%

Wed Jan 27 16:20:27 EST 2016

Hi Yury,

(Sorry for misspelling your name previously!)

> Yes, we'll need to add CALL_METHOD{_VAR|_KW|etc} opcodes to optimize all
> kind of method calls.  However, I'm not sure how big the impact will be,
> need to do more benchmarking.

I never did such fine grained analysis with MicroPython.  I don't
think there are many uses of * and ** that it'd be worth it, but
definitely there are lots of uses of plain keywords.  Also, you'd want
to consider how simple/complex it is to treat all these different
opcodes in the compiler.  For us, it's simpler to treat everything the
same.  Otherwise your LOAD_METHOD part of the compiler will need to
peek deep into the AST to see what kind of call it is.

> BTW, how do you benchmark MicroPython?

Haha, good question!  Well, we use Pystone 1.2 (unmodified) to do
basic benchmarking, and find it to be quite good.  We track our code
live at:

http://micropython.org/resources/code-dashboard/

You can see there the red line, which is the Pystone result.  There
was a big jump around Jan 2015 which is when we introduced opcode
dictionary caching.  And since then it's been very gradually
increasing due to small optimisations here and there.

Pystone is actually a great benchmark for embedded systems because it
gives very reliable results there (almost zero variation across runs)
and if we can squeeze 5 more Pystones out with some change then we
know that it's a good optimisation (for efficiency at least).

For us, low RAM usage and small code size are the most important
factors, and we track those meticulously.  But in fact, smaller code
size quite often correlates with more efficient code because there's
less to execute and it fits in the CPU cache (at least on the
desktop).

We do have some other benchmarks, but they are highly specialised for
us.  For example, how fast can you bit bang a GPIO pin using pure
Python code.  Currently we get around 200kHz on a 168MHz MCU, which
shows that pure (Micro)Python code is about 100 times slower than C.

> That's a neat idea!  You're right, it does require bytecode to become
> writeable.  I considered implementing a similar strategy, but this would
> be a big change for CPython.  So I decided to minimize the impact of the
> patch and leave the opcodes untouched.

I think you need to consider "big" changes, especially ones like this
that can have a great (and good) impact.  But really, this is a
behind-the-scenes change that *should not* affect end users, and so
you should not have any second thoughts about doing it.  One problem I
see with CPython is that it exposes way too much to the user (both
Python programmer and C extension writer) and this hurts both language
evolution (you constantly need to provide backwards compatibility) and
ability to optimise.

Cheers,
Damien.