[SciPy-Dev] Cython as build dependency, file/dll size and current issues

Ralf Gommers ralf.gommers at googlemail.com
Fri Jul 6 12:11:10 EDT 2012


On Fri, Jul 6, 2012 at 10:11 AM, Thouis (Ray) Jones <thouis at gmail.com>wrote:

> On Thu, Jul 5, 2012 at 9:37 PM, Ralf Gommers
> <ralf.gommers at googlemail.com> wrote:
> >
> >
> > On Thu, Jul 5, 2012 at 9:26 PM, Matthew Brett <matthew.brett at gmail.com>
> > wrote:
> >>
> >> Hi,
> >>
> >> On Thu, Jul 5, 2012 at 12:18 PM, Ralf Gommers
> >> <ralf.gommers at googlemail.com> wrote:
> >> >
> >> >
> >> > On Thu, Jul 5, 2012 at 8:57 PM, Matthew Brett <
> matthew.brett at gmail.com>
> >> > wrote:
> >> >>
> >> >> Hi,
> >> >>
> >> >> On Thu, Jul 5, 2012 at 11:35 AM, Ralf Gommers
> >> >> <ralf.gommers at googlemail.com> wrote:
> >> >> >
> >> >> >
> >> >> > On Thu, Jul 5, 2012 at 8:31 PM, Matthew Brett
> >> >> > <matthew.brett at gmail.com>
> >> >> > wrote:
> >> >> >>
> >> >> >> Hi,
> >> >> >>
> >> >> >> On Thu, Jul 5, 2012 at 11:25 AM, Ralf Gommers
> >> >> >> <ralf.gommers at googlemail.com> wrote:
> >> >> >> > Hi all,
> >> >> >> >
> >> >> >> > On https://github.com/scipy/scipy/pull/261 the problem with
> large
> >> >> >> > size
> >> >> >> > of
> >> >> >> > generated C files from Cython came up again, and Matthew
> suggested
> >> >> >> > to
> >> >> >> > add
> >> >> >> > Cython as a build time dependency. He also pointed out that this
> >> >> >> > was
> >> >> >> > discussed before, with most people being in favor:
> >> >> >> >
> >> >> >> >
> http://mail.scipy.org/pipermail/scipy-dev/2009-November/013272.html
> >> >> >> >
> >> >> >> >
> http://mail.scipy.org/pipermail/scipy-dev/2009-November/013308.html
> >> >> >> > We discussed the same issue on
> >> >> >> > https://github.com/scipy/scipy/pull/211
> >> >> >> > recently, and also the size of the binary.
> >> >> >> >
> >> >> >> > This is probably also the right moment to point out other recent
> >> >> >> > Cython
> >> >> >> > issues we've had:
> >> >> >> > 1. A memoryview issue with Python 2.4, either a Cython or Numpy
> >> >> >> > bug:
> >> >> >> > https://github.com/numpy/numpy/pull/307
> >> >> >> > 2. We had to manually patch the generated C files when using
> >> >> >> > Cython
> >> >> >> > 0.16, to
> >> >> >> > make them work with MinGW:
> >> >> >> > http://projects.scipy.org/scipy/ticket/1673
> >> >> >> > 3. According to Ray, there's also an indexing bug in Cython 0.16
> >> >> >> > which
> >> >> >> > requires to use 0.17-dev for
> >> >> >> > https://github.com/scipy/scipy/pull/261
> >> >> >> >
> >> >> >> > I think it's clear that PR's like #261 above (Ray's
> ndimage.label
> >> >> >> > rewrite)
> >> >> >> > are in principle a good thing: faster and more general code
> which
> >> >> >> > is
> >> >> >> > easier
> >> >> >> > to maintain. Now the question is what to do though. Here's some
> >> >> >> > options
> >> >> >> > that
> >> >> >> > I see:
> >> >> >> >
> >> >> >> > a) Keep things as is for now. Accept large file/binary sizes.
> >> >> >> > Manually
> >> >> >> > patch
> >> >> >> > the generated C if necessary.
> >> >> >> > b) Keep things as is for now. Either go back to Cython 0.15, or
> >> >> >> > bump
> >> >> >> > required numpy version to latest dev version to not have to
> >> >> >> > manually
> >> >> >> > patch
> >> >> >> > the generated C files.
> >> >> >> > c) Keep things as they are now, without accepting too large
> >> >> >> > file/binary
> >> >> >> > sizes. To be defined what too large. Means we can't get the full
> >> >> >> > benefits of
> >> >> >> > fused types for example.
> >> >> >> > d) Move to Cython as a build dependency. Write down the required
> >> >> >> > versions
> >> >> >> > and incompatibilities in the docs.
> >> >> >> > e) Include a Cython version in the scipy git repo, patch it to
> >> >> >> > solve
> >> >> >> > the
> >> >> >> > above issues 2 and 3 (and any other ones that come along).
> >> >> >> > f) Some combination of the above.
> >> >> >> > g) Any other options?
> >> >> >>
> >> >> >> Am I right in thinking that Cython 0.17dev will generate usable C
> >> >> >> files without patching?
> >> >> >
> >> >> >
> >> >> > Yes.
> >> >>
> >> >> How about making Cython 0.17 a developer build-time dependency?
> >> >
> >> >
> >> > That's an option. Requiring a dev version will mean broken builds for
> >> > some
> >> > of the users that don't read the docs well but simply do "easyinstall
> >> > cython". I'm not sure how acceptable that is.
> >>
> >> We could surely raise an informative error for that case?  I hope that
> >> there won't be long before the 0.17 release - but we should check with
> >> the Cython folks.
> >
> >
> > True. Perhaps it's not a big issue.
> >>
> >>
> >> On the plus side, lowering the barrier to rewriting in Cython seems
> >> like a really big win, especially with memoryviews and fused types
> >> available.
> >
> >
> > Agreed about lower barrier and fused types.
> >
> > Memoryviews are still not OK, because of
> > https://github.com/numpy/numpy/pull/307.
> >
> >>
> >> > That still leaves the (mostly orthogonal) question about binary size.
> I
> >> > just
> >> > built Ray's PR, _nd_label.so is 1.4 Mb. For one function.
> >>
> >> Hmm.   1.4 Mb seems OK to me for the binary
> >
> >
> > Really? For one function? If we do that for each function, we end up
> with 4
> > Gb.
> >
> >>
> >> - but I can see that we'd
> >> have to watch that.  Maybe it would be worth asking on the Cython list
> >> whether there is any way of reducing this, maybe by sharing across
> >> extensions.   How is the load time for that extension?
> >
> >
> > Very poor. (all hot cache):
> >
> >  $ time python -c ""
> > real    0m0.039s
> > user    0m0.017s
> > sys    0m0.017s
> >
> > $ time python -c "import numpy"
> > real    0m0.187s
> > user    0m0.080s
> > sys    0m0.100s
> >
> > $ time python -c "import _ni_label"
> > real    0m0.206s
> > user    0m0.081s
> > sys    0m0.109s
>
> To be fair, the _ni_label module also imports numpy.


Sorry, I should have mentioned that.


> So the delta is around 0.019 s, still not great, but not as bad as the
> test seems to
> show.  (Unless I was missing something, and 0.019 is actually that
> bad.)
>

Depends how you look at it. At the current rate of cythonizing, the damage
is probably fairly limited. Although 10% of the import time of numpy may
still be considered a problem by some.

If we'd convert a significant fraction of code to Cython though, this would
give a huge penalty on load time and memory usage. Scipy has a total of
1073 functions and objects at the moment - determined by the sum of
len(module.__all__) for all modules. Therefore 20 ms load time and O(100
kb) binary size for one function is a bit much.

Note that the above is not a criticism of your PR. _ni_label now has a
similar footprint to other Cython code in scipy, so this discussion
shouldn't hold up merging it in.

Ralf
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scipy-dev/attachments/20120706/4944b36b/attachment.html>


More information about the SciPy-Dev mailing list