Python Front-end to GCC

Tue Oct 22 08:00:37 EDT 2013

On Tue, 22 Oct 2013 10:14:16 +0100, Oscar Benjamin wrote:

> On 22 October 2013 00:41, Steven D'Aprano
> <steve+comp.lang.python at pearwood.info> wrote:
>> On Mon, 21 Oct 2013 10:55:10 +0100, Oscar Benjamin wrote:
>>
>>> On 21 October 2013 08:46, Steven D'Aprano <steve at pearwood.info> wrote:
>>
>>>> On the contrary, you have that backwards. An optimizing JIT compiler
>>>> can often produce much more efficient, heavily optimized code than a
>>>> static AOT compiler, and at the very least they can optimize
>>>> different things than a static compiler can. This is why very few
>>>> people think that, in the long run, Nuitka can be as fast as PyPy,
>>>> and why PyPy's ultimate aim to be "faster than C" is not moonbeams:
>>>
>>> That may be true but both the examples below are spurious at best. A
>>> decent AOT compiler would reduce both programs to the NULL program as
>>> noted by haypo:
>>> http://morepypy.blogspot.co.uk/2011/02/pypy-faster-than-c-on-
carefully-
>> crafted.html?showComment=1297205903746#c2530451800553246683

Keep in mind that the post's author, Maciej Fijalkowski, is not a native 
English speaker (to the best of my knowledge). You or I would probably 
have called the post a *contrived* example, not a "carefully crafted one" 
-- the meaning is the same, but the connotations are different.

Micro-benchmarks are mostly of theoretical interest, and contrived ones 
even more so, but still of interest. One needs to be careful not to read 
too much into them, but also not to read too little into them.

>> Are you suggesting that gcc is not a decent compiler?
> 
> No.
> 
>> If "optimize away
>> to the null program" is such an obvious thing to do, why doesn't the
>> most popular C compiler in the [FOSS] world do it?
> 
> It does if you pass the appropriate optimisation setting (as shown in
> haypo's comment). I should have been clearer.

"C can do nothing 10 times faster than Python!" -- well, okay, but what 
does that tell you about my long-running web server app? Benchmarks at 
the best of time are only suggestive, benchmarks for null programs are 
even less useful.

The very next comment after Haypo is an answer to his observation: 

    [quote]
    @haypo print the result so the loop don't get removed as dead 
    code. Besides, the problem is really the fact that's -flto is 
    unfair since python imports more resemble shared libraries 
    than statically-compiled files.

I'll be honest, I don't know enough C to really judge that claim, but I 
have noticed that benchmarks rarely compare apples and oranges, 
especially when C is involved. You can't eliminate all the differences 
between the code being generated, or at least not easily, since different 
languages have deep-seated differences in semantics that can't be 
entirely eliminated. But you should at least make some effort to compare 
code that does the same thing the same way.

Here's an example: responding to a benchmark showing a Haskell compiler 
generating faster code than a C compiler, somebody re-wrote the C code 
and got the opposite result:

http://jacquesmattheij.com/when-haskell-is-not-faster-than-c

Again, I can't judge the validity of all of the changes he made, but one 
stood out like a sore thumb:

    [quote]
    C does not require you to set static global arrays to ‘0’, so the 
    for loop in the main function can go...

Wait a minute... Haskell, I'm pretty sure, zeroes memory. C doesn't. So 
the C code is now doing less work. Yes, your C compiler will allow you to 
avoid zeroing memory before using it, and you'll save some time 
initially. But eventually[1] you will need to fix the security 
vulnerability by adding code to zero the memory, exactly as Haskell and 
other more secure languages already do. So *not* zeroing the memory is 
cheating. It's not something you'd do in real code, not if you care about 
security and correctness. Even if you don't care about security, you 
should care about benchmarking both languages performing the same amount 
of work.

Now, I may be completely off-base here. Some Haskell expert may chime up 
to say that Haskell does not, in fact, zero memory. But it does 
*something*, I'm sure, perhaps it tracks what memory is undefined and 
prevents reads from it, or something. Whatever it does, if it does it at 
runtime, the C benchmark better do the same thing, or it's an unfair 
comparison:

"Safely drive to the mall obeying all speed limits and traffic signals in 
a Chevy Volt, versus speed down the road running red lights and stop 
signs in a Ford Taurus" -- would it be any surprise that the Taurus is 
faster?

[...]
> They are more than carefully crafted. They are useless and misleading.
> It's reasonable to contrive of a simple CPU-intensive programming
> problem for benchmarking. But the program should do *something* even if
> it is contrived. Both programs here consist *entirely* of dead code.

But since the dead code is *not* eliminated, it is actually executed. If 
it's executed, it's not really dead, is it? Does it really matter that 
you don't do anything with the result? I'm with Maciej on this one -- 
*executing* the code given is faster in PyPy than in C, at least for this 
C compiler. Maybe C is faster to not execute it. Is that really an 
interesting benchmark? "C does nothing ten times faster than PyPy does 
something!"

Given a sufficiently advanced static analyser, PyPy could probably 
special-case programs that do nothing. Then you're in a race to compare 
the speed at which the PyPy runtime environment can start up and do 
nothing, versus a stand-alone executable that has to start up and do 
nothing. If this is a benchmark that people care about, I suggest they 
need to get out more :-)

Ultimately, this is an argument as what counts as a fair apples-to-apples 
comparison, and what doesn't. Some people consider that for a fair test, 
the code has to actually be executed. If you optimize away code and don't 
execute it, that's not a good benchmark. I agree with them. You don't. I 
can see both sides of the argument, and think that they both have 
validity, but on balance agree with the PyPy guys here: a compiler that 
optimizes away "for i = 1 to 1000: pass" to do-nothing is useful, but if 
you wanted to find out the runtime cost of a for-loop, you would surely 
prefer to disable that optimization and time how long it takes the for 
loop to actually run.

The actual point that the PyPy developers keep making is that a JIT 
compiler can use runtime information to perform optimizations which a 
static compiler like gcc cannot, and I haven't seen anyone dispute that 
point. More in the comments here:

    [quote]
    The point here is not that the Python implementation of 
    formatting is better than the C standard library, but that 
    dynamic optimisation can make a big difference. The first 
    time the formatting operator is called its format string is 
    parsed and assembly code for assembling the output generated. 
    The next 999999 times that assembly code is used without 
    doing the parsing step. Even if sprintf were defined locally,
    a static compiler can’t optimise away the parsing step, so 
    that work is done redundantly every time around the loop.

http://morepypy.blogspot.com/2011/08/pypy-is-faster-than-c-again-string.html?showComment=1312357475889#c6708170690935286644

Also possibly of interest:

http://beza1e1.tuxen.de/articles/faster_than_C.html

[1] Probably not until after the Zero Day exploit is released.  

-- 
Steven