[pypy-dev] Updated 'High Performance Python' tutorial (the one from EuroPython 2011)

Tue Nov 15 15:54:07 CET 2011

Hi Antonio! Apologies for the slow reply, this got filed into a subfolder.

The numbers are interesting, I'm also interested in the C version. I'm
hoping that my tutorial will be accepted for PyCon next March (the
talks are announced in two weeks), assuming I get to talk again I'll
update my tutorial. Adding more for PyPy and having a C equivalent
will be very useful.

Given that the C version should be very similar to the ShedSkin
version, maybe it just comes down to compiler differences? On my
Macbook (where I originally wrote the talk) I think the differences in
speed came from two versions of gcc (Cython seemed to prefer one,
ShedSkin the other, I ran out of time trying to unify that test). Do
you definitely use the same optimisation flags? ShedSkin (from memory)
requests fast math and a few other things in the generated Makefile.

Ian.

On 7 November 2011 18:04, Antonio Cuni <anto.cuni at gmail.com> wrote:
> Hello Ian,
>
> On 25/07/11 11:00, Ian Ozsvald wrote:
>>
>> Dear all, I've published v0.2 of my High Performance Python tutorial
>> write-up from the session I ran at EuroPython:
>>
>> http://ianozsvald.com/2011/07/25/high-performance-python-tutorial-v0-2-from-europython-2011/
>
> today I and Armin investigated a bit more about the performances of the
> mandelbrot algorithm that you wrote for your tutorial.  What we found is
> very interesting :-).
>
> We compared three versions of the code:
>
> - a (slightly modified) pure python one on PyPy
> - the Cython one using calculate_z.pyx_2_bettermath
> - the shedskin one, using shedskin2.py
>
> The PyPy version looks like this:
>
> def calculate_z_serial_purepython(q, maxiter, z):
>    """Pure python with complex datatype, iterating over list of q and z"""
>    output = [0] * len(q)
>    for i in range(len(q)):
>        zi = z[i]
>        qi = q[i]
>        for iteration in range(maxiter):
>            zi = zi * zi + qi
>            if (zi.real*zi.real + zi.imag*zi.imag) > 4.0:
>                output[i] = iteration
>                break
>    return output
>
> i.e., it is exactly the same as pure_python_2.py, but we avoid to use
> abs(zi), so it is comparable with the cython and shedskin version.
>
> First, we ran the programs to calculate passing "1000 1000" as arguments,
> and these are the results:
>
> PyPy: 1.95 secs
> Cython: 0.58 secs
> Shedskin: 0.42 secs
>
> so, PyPy is ~4.5x slower than Shedskin.
>
> However, we realized that using the default values for x1,x2,y1,y2, the
> innermost loop runs very few iterations most of the time, and this is one
> case in which PyPy suffer most, because it needs to go through a bridge to
> continue the execution, and at the moment bridges are slower than loops.
>
> So, we changed the values of x1,x2,y1,y2 to compute a different region, in
> which the innermost loop runs more frequently.  We used these values:
> x1, x2, y1, y2 = 0.37865401-0.02, 0.37865401+0.02, 0.669227668-0.02,
> 0.669227668+0.02
>
> and since all programs are faster to compute the image, we used "3000 3000"
> as arguments from the command line.  These are the results:
>
> PyPy: 0.89
> Cython: 1.76
> Shedskin: 0.26
>
> So, in this case, PyPy is ~2x faster than Cython and ~3.5x slower than
> Shedskin.
>
> In the meantime, Armin wrote a C version of it:
> http://paste.pocoo.org/raw/504216/
>
> which tooks 0.946 seconds to complete. This is in line with the PyPy's
> result, but we are still investigating why the shedskin's version is so much
> faster.
>
> ciao,
> Anto
>

-- 
Ian Ozsvald (A.I. researcher)
ian at IanOzsvald.com

http://IanOzsvald.com
http://MorConsulting.com/
http://StrongSteam.com/
http://SocialTiesApp.com/
http://TheScreencastingHandbook.com
http://FivePoundApp.com/
http://twitter.com/IanOzsvald