Python Front-end to GCC

Tue Oct 22 05:14:16 EDT 2013

On 22 October 2013 00:41, Steven D'Aprano
<steve+comp.lang.python at pearwood.info> wrote:
> On Mon, 21 Oct 2013 10:55:10 +0100, Oscar Benjamin wrote:
>
>> On 21 October 2013 08:46, Steven D'Aprano <steve at pearwood.info> wrote:
>
>>> On the contrary, you have that backwards. An optimizing JIT compiler
>>> can often produce much more efficient, heavily optimized code than a
>>> static AOT compiler, and at the very least they can optimize different
>>> things than a static compiler can. This is why very few people think
>>> that, in the long run, Nuitka can be as fast as PyPy, and why PyPy's
>>> ultimate aim to be "faster than C" is not moonbeams:
>>
>> That may be true but both the examples below are spurious at best. A
>> decent AOT compiler would reduce both programs to the NULL program as
>> noted by haypo:
>> http://morepypy.blogspot.co.uk/2011/02/pypy-faster-than-c-on-carefully-
> crafted.html?showComment=1297205903746#c2530451800553246683
>
> Are you suggesting that gcc is not a decent compiler?

No.

> If "optimize away
> to the null program" is such an obvious thing to do, why doesn't the most
> popular C compiler in the [FOSS] world do it?

It does if you pass the appropriate optimisation setting (as shown in
haypo's comment). I should have been clearer.

gcc compiles programs in two phases: compilation and linking.
Compilation creates the object files x.o and y.o from x.c and y.c.
Linking creates the output binary a.exe from x.o and y.o. The -O3
optimisation setting used in the blog post enables optimisation in the
compilation phase. However each .c file is compiled independently so
because the add() function is defined in x.c and called in y.c the
compiler is unable to inline it. It also can't remove it as dead code
because although it knows that the return value isn't used it doesn't
know if the call has side effects.

You might think it's silly that gcc can't optimise across source files
and if so you're right because actually it can if you enable link time
optimisation with the -flto flag as described by haypo. So if I do
that with the code from the blog post I get (using mingw gcc 4.7.2 on
Windows):

$ cat x.c
double add(double a, double b)
{
  return a + b;
}
$ cat y.c
double add(double a, double b);

int main()
{
  int i = 0;
  double a = 0;
  while (i < 1000000000) {
    a += 1.0;
    add(a, a);
    i++;
  }
}
$ gcc -O3 -flto x.c y.c
$ time ./a.exe

real    0m0.063s
user    0m0.015s
sys     0m0.000s
$ time ./a.exe  # warm cache

real    0m0.016s
user    0m0.015s
sys     0m0.015s

So gcc can optimise this all the way to the null program which takes
15ms to run (that's 600 times faster than pypy).

Note that even if pypy could optimise it all the way to the null
program it would still be 10 times slower than C's null program:

$ touch null.py
$ time pypy null.py

real    0m0.188s
user    0m0.076s
sys     0m0.046s
$ time pypy null.py  # warm cache

real    0m0.157s
user    0m0.060s
sys     0m0.030s

> [...]
>> So the pypy version takes twice as long to run this. That's impressive
>> but it's not "faster than C".

(Actually if I enable -flts with that example the C version runs 6-7
times faster due to inlining.)

> Nobody is saying that PyPy is *generally* capable of making any arbitrary
> piece of code run as fast as hand-written C code. You'll notice that the
> PyPy posts are described as *carefully crafted* examples.

They are more than carefully crafted. They are useless and misleading.
It's reasonable to contrive of a simple CPU-intensive programming
problem for benchmarking. But the program should do *something* even
if it is contrived. Both programs here consist *entirely* of dead
code. Yes it's reasonable for the pypy devs to test things like this
during development. No it's not reasonable to showcase this as an
example of the potential for pypy to speed up any useful computation.

> I believe that, realistically, PyPy has potential to bring Python into
> Java and .Net territories, namely to run typical benchmarks within an
> order of magnitude of C speeds on the same benchmarks. C is a very hard
> target to beat, because vanilla C code does *so little* compared to other
> languages: no garbage collection, no runtime dynamism, very little
> polymorphism. So benchmarking simple algorithms plays to C's strengths,
> while ignoring C's weaknesses.

As I said I don't want to criticise PyPy. I've just started using it
and I it is impressive. However both of those blog posts are
misleading. Not only that but the authors must know exactly why they
are misleading. Because of that I will take any other claims with a
big pinch of salt in future.

Oscar