[SciPy-Dev] Cython as build dependency, file/dll size and current issues

Matthew Brett matthew.brett at gmail.com
Thu Jul 5 15:45:26 EDT 2012


Hi,

On Thu, Jul 5, 2012 at 12:37 PM, Ralf Gommers
<ralf.gommers at googlemail.com> wrote:
>
>
> On Thu, Jul 5, 2012 at 9:26 PM, Matthew Brett <matthew.brett at gmail.com>
> wrote:
>>
>> Hi,
>>
>> On Thu, Jul 5, 2012 at 12:18 PM, Ralf Gommers
>> <ralf.gommers at googlemail.com> wrote:
>> >
>> >
>> > On Thu, Jul 5, 2012 at 8:57 PM, Matthew Brett <matthew.brett at gmail.com>
>> > wrote:
>> >>
>> >> Hi,
>> >>
>> >> On Thu, Jul 5, 2012 at 11:35 AM, Ralf Gommers
>> >> <ralf.gommers at googlemail.com> wrote:
>> >> >
>> >> >
>> >> > On Thu, Jul 5, 2012 at 8:31 PM, Matthew Brett
>> >> > <matthew.brett at gmail.com>
>> >> > wrote:
>> >> >>
>> >> >> Hi,
>> >> >>
>> >> >> On Thu, Jul 5, 2012 at 11:25 AM, Ralf Gommers
>> >> >> <ralf.gommers at googlemail.com> wrote:
>> >> >> > Hi all,
>> >> >> >
>> >> >> > On https://github.com/scipy/scipy/pull/261 the problem with large
>> >> >> > size
>> >> >> > of
>> >> >> > generated C files from Cython came up again, and Matthew suggested
>> >> >> > to
>> >> >> > add
>> >> >> > Cython as a build time dependency. He also pointed out that this
>> >> >> > was
>> >> >> > discussed before, with most people being in favor:
>> >> >> >
>> >> >> > http://mail.scipy.org/pipermail/scipy-dev/2009-November/013272.html
>> >> >> >
>> >> >> > http://mail.scipy.org/pipermail/scipy-dev/2009-November/013308.html
>> >> >> > We discussed the same issue on
>> >> >> > https://github.com/scipy/scipy/pull/211
>> >> >> > recently, and also the size of the binary.
>> >> >> >
>> >> >> > This is probably also the right moment to point out other recent
>> >> >> > Cython
>> >> >> > issues we've had:
>> >> >> > 1. A memoryview issue with Python 2.4, either a Cython or Numpy
>> >> >> > bug:
>> >> >> > https://github.com/numpy/numpy/pull/307
>> >> >> > 2. We had to manually patch the generated C files when using
>> >> >> > Cython
>> >> >> > 0.16, to
>> >> >> > make them work with MinGW:
>> >> >> > http://projects.scipy.org/scipy/ticket/1673
>> >> >> > 3. According to Ray, there's also an indexing bug in Cython 0.16
>> >> >> > which
>> >> >> > requires to use 0.17-dev for
>> >> >> > https://github.com/scipy/scipy/pull/261
>> >> >> >
>> >> >> > I think it's clear that PR's like #261 above (Ray's ndimage.label
>> >> >> > rewrite)
>> >> >> > are in principle a good thing: faster and more general code which
>> >> >> > is
>> >> >> > easier
>> >> >> > to maintain. Now the question is what to do though. Here's some
>> >> >> > options
>> >> >> > that
>> >> >> > I see:
>> >> >> >
>> >> >> > a) Keep things as is for now. Accept large file/binary sizes.
>> >> >> > Manually
>> >> >> > patch
>> >> >> > the generated C if necessary.
>> >> >> > b) Keep things as is for now. Either go back to Cython 0.15, or
>> >> >> > bump
>> >> >> > required numpy version to latest dev version to not have to
>> >> >> > manually
>> >> >> > patch
>> >> >> > the generated C files.
>> >> >> > c) Keep things as they are now, without accepting too large
>> >> >> > file/binary
>> >> >> > sizes. To be defined what too large. Means we can't get the full
>> >> >> > benefits of
>> >> >> > fused types for example.
>> >> >> > d) Move to Cython as a build dependency. Write down the required
>> >> >> > versions
>> >> >> > and incompatibilities in the docs.
>> >> >> > e) Include a Cython version in the scipy git repo, patch it to
>> >> >> > solve
>> >> >> > the
>> >> >> > above issues 2 and 3 (and any other ones that come along).
>> >> >> > f) Some combination of the above.
>> >> >> > g) Any other options?
>> >> >>
>> >> >> Am I right in thinking that Cython 0.17dev will generate usable C
>> >> >> files without patching?
>> >> >
>> >> >
>> >> > Yes.
>> >>
>> >> How about making Cython 0.17 a developer build-time dependency?
>> >
>> >
>> > That's an option. Requiring a dev version will mean broken builds for
>> > some
>> > of the users that don't read the docs well but simply do "easyinstall
>> > cython". I'm not sure how acceptable that is.
>>
>> We could surely raise an informative error for that case?  I hope that
>> there won't be long before the 0.17 release - but we should check with
>> the Cython folks.
>
>
> True. Perhaps it's not a big issue.
>>
>>
>> On the plus side, lowering the barrier to rewriting in Cython seems
>> like a really big win, especially with memoryviews and fused types
>> available.
>
>
> Agreed about lower barrier and fused types.
>
> Memoryviews are still not OK, because of
> https://github.com/numpy/numpy/pull/307.

I'm afraid I didn't understand that discussion very well.   Does that
only apply to python 2.4?   I had the impression we were dropping 2.4
compatibility, but I may be remembering wrong.

>>
>> > That still leaves the (mostly orthogonal) question about binary size. I
>> > just
>> > built Ray's PR, _nd_label.so is 1.4 Mb. For one function.
>>
>> Hmm.   1.4 Mb seems OK to me for the binary
>
>
> Really? For one function? If we do that for each function, we end up with 4
> Gb.
>
>>
>> - but I can see that we'd
>> have to watch that.  Maybe it would be worth asking on the Cython list
>> whether there is any way of reducing this, maybe by sharing across
>> extensions.   How is the load time for that extension?
>
>
> Very poor. (all hot cache):
>
>  $ time python -c ""
> real    0m0.039s
> user    0m0.017s
> sys    0m0.017s
>
> $ time python -c "import numpy"
> real    0m0.187s
> user    0m0.080s
> sys    0m0.100s
>
> $ time python -c "import _ni_label"
> real    0m0.206s
> user    0m0.081s
> sys    0m0.109s

I guess that's much slower than the original C extension?  I'd tend to
prefer a slow loading but fast running and maintainable ndimage, but
it's unfortunate we have to keep these tradeoffs in mind...

Cheers,

Matthew



More information about the SciPy-Dev mailing list