Python Front-end to GCC

Tue Oct 22 05:32:30 EDT 2013

On Tuesday, 22 October 2013 10:14:16 UTC+1, Oscar Benjamin  wrote:
> On 22 October 2013 00:41, Steven D'Aprano
> 
> >>> On the contrary, you have that backwards. An optimizing JIT compiler
> 
> >>> can often produce much more efficient, heavily optimized code than a
> 
> >>> static AOT compiler, and at the very least they can optimize different
> 
> >>> things than a static compiler can. This is why very few people think
> 
> >>> that, in the long run, Nuitka can be as fast as PyPy, and why PyPy's
> 
> >>> ultimate aim to be "faster than C" is not moonbeams:
> 
> >>
> 
> >> That may be true but both the examples below are spurious at best. A
> 
> >> decent AOT compiler would reduce both programs to the NULL program as
> 
> >> noted by haypo:
> 
> >> http://morepypy.blogspot.co.uk/2011/02/pypy-faster-than-c-on-carefully-
> 
> > crafted.html?showComment=1297205903746#c2530451800553246683
> 
> >
> 
> > Are you suggesting that gcc is not a decent compiler?
> 
> 
> 
> No.
> 
> 
> 
> > If "optimize away
> 
> > to the null program" is such an obvious thing to do, why doesn't the most
> 
> > popular C compiler in the [FOSS] world do it?
> 
> 
> 
> It does if you pass the appropriate optimisation setting (as shown in
> 
> haypo's comment). I should have been clearer.
> 
> 
> 
> gcc compiles programs in two phases: compilation and linking.
> 
> Compilation creates the object files x.o and y.o from x.c and y.c.
> 
> Linking creates the output binary a.exe from x.o and y.o. The -O3
> 
> optimisation setting used in the blog post enables optimisation in the
> 
> compilation phase. However each .c file is compiled independently so
> 
> because the add() function is defined in x.c and called in y.c the
> 
> compiler is unable to inline it. It also can't remove it as dead code
> 
> because although it knows that the return value isn't used it doesn't
> 
> know if the call has side effects.
> 
> 
> 
> You might think it's silly that gcc can't optimise across source files
> 
> and if so you're right because actually it can if you enable link time
> 
> optimisation with the -flto flag as described by haypo. So if I do
> 
> that with the code from the blog post I get (using mingw gcc 4.7.2 on
> 
> Windows):
> 
> 
> 
> $ cat x.c
> 
> double add(double a, double b)
> 
> {
> 
>   return a + b;
> 
> }
> 
> $ cat y.c
> 
> double add(double a, double b);
> 
> 
> 
> int main()
> 
> {
> 
>   int i = 0;
> 
>   double a = 0;
> 
>   while (i < 1000000000) {
> 
>     a += 1.0;
> 
>     add(a, a);
> 
>     i++;
> 
>   }
> 
> }
> 
> $ gcc -O3 -flto x.c y.c
> 
> $ time ./a.exe
> 
> 
> 
> real    0m0.063s
> 
> user    0m0.015s
> 
> sys     0m0.000s
> 
> $ time ./a.exe  # warm cache
> 
> 
> 
> real    0m0.016s
> 
> user    0m0.015s
> 
> sys     0m0.015s
> 
> 
> 
> So gcc can optimise this all the way to the null program which takes
> 
> 15ms to run (that's 600 times faster than pypy).
> 
> 
> 
> Note that even if pypy could optimise it all the way to the null
> 
> program it would still be 10 times slower than C's null program:
> 
> 
> 
> $ touch null.py
> 
> $ time pypy null.py
> 
> 
> 
> real    0m0.188s
> 
> user    0m0.076s
> 
> sys     0m0.046s
> 
> $ time pypy null.py  # warm cache
> 
> 
> 
> real    0m0.157s
> 
> user    0m0.060s
> 
> sys     0m0.030s
> 
> 
> 
> > [...]
> 
> >> So the pypy version takes twice as long to run this. That's impressive
> 
> >> but it's not "faster than C".
> 
> 
> 
> (Actually if I enable -flts with that example the C version runs 6-7
> 
> times faster due to inlining.)
> 
> 
> 
> > Nobody is saying that PyPy is *generally* capable of making any arbitrary
> 
> > piece of code run as fast as hand-written C code. You'll notice that the
> 
> > PyPy posts are described as *carefully crafted* examples.
> 
> 
> 
> They are more than carefully crafted. They are useless and misleading.
> 
> It's reasonable to contrive of a simple CPU-intensive programming
> 
> problem for benchmarking. But the program should do *something* even
> 
> if it is contrived. Both programs here consist *entirely* of dead
> 
> code. Yes it's reasonable for the pypy devs to test things like this
> 
> during development. No it's not reasonable to showcase this as an
> 
> example of the potential for pypy to speed up any useful computation.
> 
> 
> 
> > I believe that, realistically, PyPy has potential to bring Python into
> 
> > Java and .Net territories, namely to run typical benchmarks within an
> 
> > order of magnitude of C speeds on the same benchmarks. C is a very hard
> 
> > target to beat, because vanilla C code does *so little* compared to other
> 
> > languages: no garbage collection, no runtime dynamism, very little
> 
> > polymorphism. So benchmarking simple algorithms plays to C's strengths,
> 
> > while ignoring C's weaknesses.
> 
> 
> 
> As I said I don't want to criticise PyPy. I've just started using it
> 
> and I it is impressive. However both of those blog posts are
> 
> misleading. Not only that but the authors must know exactly why they
> 
> are misleading. Because of that I will take any other claims with a
> 
> big pinch of salt in future.
> 
> 
> 
> 
> 
> Oscar

You sir deserve a medal! I think alot of people are taking these sorts of benchmarks completely out of context and its great to see such a well rounded statement.

I applaud you so much! I've been sort of banging my head against the wall to describe what you just did as succinctly as that and couldn't.