Python Front-end to GCC

Philip Herron herron.philip at googlemail.com
Tue Oct 22 05:32:30 EDT 2013


On Tuesday, 22 October 2013 10:14:16 UTC+1, Oscar Benjamin  wrote:
> On 22 October 2013 00:41, Steven D'Aprano
> 
> >>> On the contrary, you have that backwards. An optimizing JIT compiler
> 
> >>> can often produce much more efficient, heavily optimized code than a
> 
> >>> static AOT compiler, and at the very least they can optimize different
> 
> >>> things than a static compiler can. This is why very few people think
> 
> >>> that, in the long run, Nuitka can be as fast as PyPy, and why PyPy's
> 
> >>> ultimate aim to be "faster than C" is not moonbeams:
> 
> >>
> 
> >> That may be true but both the examples below are spurious at best. A
> 
> >> decent AOT compiler would reduce both programs to the NULL program as
> 
> >> noted by haypo:
> 
> >> http://morepypy.blogspot.co.uk/2011/02/pypy-faster-than-c-on-carefully-
> 
> > crafted.html?showComment=1297205903746#c2530451800553246683
> 
> >
> 
> > Are you suggesting that gcc is not a decent compiler?
> 
> 
> 
> No.
> 
> 
> 
> > If "optimize away
> 
> > to the null program" is such an obvious thing to do, why doesn't the most
> 
> > popular C compiler in the [FOSS] world do it?
> 
> 
> 
> It does if you pass the appropriate optimisation setting (as shown in
> 
> haypo's comment). I should have been clearer.
> 
> 
> 
> gcc compiles programs in two phases: compilation and linking.
> 
> Compilation creates the object files x.o and y.o from x.c and y.c.
> 
> Linking creates the output binary a.exe from x.o and y.o. The -O3
> 
> optimisation setting used in the blog post enables optimisation in the
> 
> compilation phase. However each .c file is compiled independently so
> 
> because the add() function is defined in x.c and called in y.c the
> 
> compiler is unable to inline it. It also can't remove it as dead code
> 
> because although it knows that the return value isn't used it doesn't
> 
> know if the call has side effects.
> 
> 
> 
> You might think it's silly that gcc can't optimise across source files
> 
> and if so you're right because actually it can if you enable link time
> 
> optimisation with the -flto flag as described by haypo. So if I do
> 
> that with the code from the blog post I get (using mingw gcc 4.7.2 on
> 
> Windows):
> 
> 
> 
> $ cat x.c
> 
> double add(double a, double b)
> 
> {
> 
>   return a + b;
> 
> }
> 
> $ cat y.c
> 
> double add(double a, double b);
> 
> 
> 
> int main()
> 
> {
> 
>   int i = 0;
> 
>   double a = 0;
> 
>   while (i < 1000000000) {
> 
>     a += 1.0;
> 
>     add(a, a);
> 
>     i++;
> 
>   }
> 
> }
> 
> $ gcc -O3 -flto x.c y.c
> 
> $ time ./a.exe
> 
> 
> 
> real    0m0.063s
> 
> user    0m0.015s
> 
> sys     0m0.000s
> 
> $ time ./a.exe  # warm cache
> 
> 
> 
> real    0m0.016s
> 
> user    0m0.015s
> 
> sys     0m0.015s
> 
> 
> 
> So gcc can optimise this all the way to the null program which takes
> 
> 15ms to run (that's 600 times faster than pypy).
> 
> 
> 
> Note that even if pypy could optimise it all the way to the null
> 
> program it would still be 10 times slower than C's null program:
> 
> 
> 
> $ touch null.py
> 
> $ time pypy null.py
> 
> 
> 
> real    0m0.188s
> 
> user    0m0.076s
> 
> sys     0m0.046s
> 
> $ time pypy null.py  # warm cache
> 
> 
> 
> real    0m0.157s
> 
> user    0m0.060s
> 
> sys     0m0.030s
> 
> 
> 
> > [...]
> 
> >> So the pypy version takes twice as long to run this. That's impressive
> 
> >> but it's not "faster than C".
> 
> 
> 
> (Actually if I enable -flts with that example the C version runs 6-7
> 
> times faster due to inlining.)
> 
> 
> 
> > Nobody is saying that PyPy is *generally* capable of making any arbitrary
> 
> > piece of code run as fast as hand-written C code. You'll notice that the
> 
> > PyPy posts are described as *carefully crafted* examples.
> 
> 
> 
> They are more than carefully crafted. They are useless and misleading.
> 
> It's reasonable to contrive of a simple CPU-intensive programming
> 
> problem for benchmarking. But the program should do *something* even
> 
> if it is contrived. Both programs here consist *entirely* of dead
> 
> code. Yes it's reasonable for the pypy devs to test things like this
> 
> during development. No it's not reasonable to showcase this as an
> 
> example of the potential for pypy to speed up any useful computation.
> 
> 
> 
> > I believe that, realistically, PyPy has potential to bring Python into
> 
> > Java and .Net territories, namely to run typical benchmarks within an
> 
> > order of magnitude of C speeds on the same benchmarks. C is a very hard
> 
> > target to beat, because vanilla C code does *so little* compared to other
> 
> > languages: no garbage collection, no runtime dynamism, very little
> 
> > polymorphism. So benchmarking simple algorithms plays to C's strengths,
> 
> > while ignoring C's weaknesses.
> 
> 
> 
> As I said I don't want to criticise PyPy. I've just started using it
> 
> and I it is impressive. However both of those blog posts are
> 
> misleading. Not only that but the authors must know exactly why they
> 
> are misleading. Because of that I will take any other claims with a
> 
> big pinch of salt in future.
> 
> 
> 
> 
> 
> Oscar

You sir deserve a medal! I think alot of people are taking these sorts of benchmarks completely out of context and its great to see such a well rounded statement.

I applaud you so much! I've been sort of banging my head against the wall to describe what you just did as succinctly as that and couldn't.



More information about the Python-list mailing list