[Cython] GSoC Proposal - Reimplement C modules in CPython's standard library in Cython.

Robert Bradshaw robertwb at math.washington.edu
Tue Apr 12 22:42:02 CEST 2011


On Tue, Apr 12, 2011 at 11:22 AM, Stefan Behnel <stefan_ml at behnel.de> wrote:
> Arthur de Souza Ribeiro, 12.04.2011 14:59:
>>
>> Hi Stefan, yes, I'm working on this, in fact I'm trying to recompile json
>> module (http://docs.python.org/library/json.html) adding some type
>> definitions and cython things o get the code faster.
>
> Cool.
>
>
>> I'm getting in trouble with some things too, I'm going to enumerate here
>> so
>> that, you could give me some tips about how to solve them.
>>
>> 1 - Compile package modules - json module is inside a package (files:
>> __init__.py, decoder.py, encoder.py, decoder.py) is there a way to
>> generate
>> the cython modules just like its get generated by cython?
>
> The __init__.py doesn't really look performance critical. It's better to
> leave that modules in plain Python, that improves readability by reducing
> surprises and simplifies reuse by other implementations.
>
> That being said, you can compile each module separately, just use the
> "cython" command line tool for that, or write a little distutils script as
> in
>
> http://docs.cython.org/src/quickstart/build.html#building-a-cython-module-using-distutils
>
> Don't worry too much about a build integration for now.
>
>
>> 2 - Because I'm getting in trouble with issue #1, I'm running the tests
>> manually, I go to %Python-dir%/Lib/tests/json_tests, get the files
>> corresponding to the tests python make and run manually.
>
> That's fine.
>
>
>> 3 - To get the performance of the module, I'm thinking about to use the
>> timeit function in  the unit tests for the project. I think a good number
>> of
>> executions would be made and it would be possible to compare each time.
>
> That's ok for a start, artificial benchmarks are good to test specific
> functionality. However, unit tests tend to be short running with a lot of
> overhead, so later on, you will need to use real code to benchmark the
> modules. I would expect that there are benchmarks for JSON implementations
> around, and you can just generate a large JSON file and run loads and dumps
> on it.
>
>
>> 4 - I didn't create the .pxd files, some problems are happening, it tells
>> methods are not defined, but, they are defined, I will try to investigate
>> this better
>
> When reporting usage related problems (preferably on the cython-users
> mailing list), it's best to present the exact error messages and the
> relevant code snippets, so that others can quickly understand what's going
> on and/or reproduce the problem.
>
>
>> The code is in this repository:
>> https://github.com/arthursribeiro/JSON-module your feedback would be very
>> important, so that I could improve my skills to get more and more able to
>> work sooner in the project.
>
> I'd strongly suggest implementing this in pure Python (.py files instead of
> .pyx files), with externally provided static types for performance. A single
> code base is very advantageous for a large project like CPython, much more
> than the ultimate 5% better performance.

While this is advantageous for the final product, it may not be the
easiest to get up and running with.

>> I think some things implemented in this rewriting process are going to be
>> useful when doing this with C modules...
>
> Well, if you can get the existing Python implementation up to mostly
> comparable speed as the C implementation, then there is no need to care
> about the C module anymore. Even if you can get only 90% of a module to run
> at comparable speed, and need to keep 10% in plain C, that's already a huge
> improvement in terms of maintainability.
>
> Stefan
> _______________________________________________
> cython-devel mailing list
> cython-devel at python.org
> http://mail.python.org/mailman/listinfo/cython-devel
>


More information about the cython-devel mailing list