[Cython] GSoC Proposal - Reimplement C modules in CPython's standard library in Cython.

Arthur de Souza Ribeiro arthurdesribeiro at gmail.com
Sun Apr 17 20:07:38 CEST 2011


2011/4/15 Stefan Behnel <stefan_ml at behnel.de>

> [please avoid top-posting]
>
> Arthur de Souza Ribeiro, 15.04.2011 04:31:
>
>  I've created the .pyx files and it passed in all python tests.
>>
>
> Fine.
>
> As far as I can see, you only added static types in some places.

Did you test if they are actually required (maybe using "cython -a")? Some
> of them look rather counterproductive and should lead to a major slow-down.


In fact, I didn't, but, after you told me to do that, I run cython -a and
removed some unnecessary types.


> I added comments to your initial commit.
>

Hi Stefan, about your first comment : "And it's better to let Cython know
that this name refers to a function."  in line 69 of encoder.pyx file I
didn't understand well what does that mean, can you explain more this
comment?

About the other comments, I think I solved them all, any problem with them
or other ones, please tell me. I'll try to fix.


> Note that it's not obvious from your initial commit what you actually
> changed. It would have been better to import the original file first, rename
> it to .pyx, and then commit your changes.
>

I created a directory named 'Diff files' where I put the files generated by
'diff' command that i run in my computer, if you think it still be better if
I commit and then change, there is no problem for me...


>
> It appears that you accidentally added your .c and .so files to your repo.
>
>
> https://github.com/arthursribeiro/JSON-module
>
>
Removed them.


>
>  To test them, as I said, I copied the .py test files to my project
>> directory, generated the .so files, import them instead of python modules
>> and run. I run every test file and it passed in all of them. To run the
>> tests, run the file 'run-tests.sh'
>>
>> I used just .pyx in this module, should I reimplement it using pxd with
>> the
>> normal .py?
>>
>
> Not at this point. I think it's more important to get some performance
> numbers to see how your module behaves compared to the C accelerator module
> (_json.c). I think the best approach to this project would actually be to
> start with profiling the Python implementation to see where performance
> problems occur (or to look through _json.c to see what the CPython
> developers considered performance critical), and then put the focus on
> trying to speed up only those parts of the Python implementation, by adding
> static types and potentially even rewriting them in a way that Cython can
> optimise them better.
>

I've profilled the module I created and the module that is in Python 3.2,
the result is that the cython module spent about 73% less time then python's
one, the output to the module was like this (blue for cython, red for
python):

The behavior between my module and python's one seems to be the same I think
that's the way it should be.

JSONModule nested_dict
         10004 function calls in 0.268 seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
    10000    0.196    0.000    0.196    0.000 :0(dumps)
        1    0.000    0.000    0.268    0.268 :0(exec)
        1    0.000    0.000    0.000    0.000 :0(setprofile)
        1    0.072    0.072    0.268    0.268 <string>:1(<module>)
        1    0.000    0.000    0.268    0.268 profile:0(for ii in
range(10000):  fun(thing))
        0    0.000             0.000          profile:0(profiler)


json nested_dict
         60004 function calls in 1.016 seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000    1.016    1.016 :0(exec)
    20000    0.136    0.000    0.136    0.000 :0(isinstance)
    10000    0.120    0.000    0.120    0.000 :0(join)
        1    0.000    0.000    0.000    0.000 :0(setprofile)
        1    0.088    0.088    1.016    1.016 <string>:1(<module>)
    10000    0.136    0.000    0.928    0.000 __init__.py:180(dumps)
    10000    0.308    0.000    0.792    0.000 encoder.py:172(encode)
    10000    0.228    0.000    0.228    0.000 encoder.py:193(iterencode)
        1    0.000    0.000    1.016    1.016 profile:0(for ii in
range(10000):  fun(thing))
        0    0.000             0.000          profile:0(profiler)


JSONModule ustring
         10004 function calls in 0.140 seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
    10000    0.072    0.000    0.072    0.000 :0(dumps)
        1    0.000    0.000    0.140    0.140 :0(exec)
        1    0.000    0.000    0.000    0.000 :0(setprofile)
        1    0.068    0.068    0.140    0.140 <string>:1(<module>)
        1    0.000    0.000    0.140    0.140 profile:0(for ii in
range(10000):  fun(thing))
        0    0.000             0.000          profile:0(profiler)


json ustring
         40004 function calls in 0.580 seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
    10000    0.092    0.000    0.092    0.000 :0(encode_basestring_ascii)
        1    0.004    0.004    0.580    0.580 :0(exec)
    10000    0.060    0.000    0.060    0.000 :0(isinstance)
        1    0.000    0.000    0.000    0.000 :0(setprofile)
        1    0.100    0.100    0.576    0.576 <string>:1(<module>)
    10000    0.152    0.000    0.476    0.000 __init__.py:180(dumps)
    10000    0.172    0.000    0.324    0.000 encoder.py:172(encode)
        1    0.000    0.000    0.580    0.580 profile:0(for ii in
range(10000):  fun(thing))
        0    0.000             0.000          profile:0(profiler)

The code is upated in repository, any comments that you might have, please,
let me know. Thank you very much for your feedback.

Best Regards.

[]s

Arthur


>
> Stefan
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/cython-devel/attachments/20110417/fb55539c/attachment.html>


More information about the cython-devel mailing list