[Numpy-discussion] Proposed Roadmap Overview

Sat Feb 18 16:40:48 EST 2012

On Sat, Feb 18, 2012 at 2:17 PM, David Cournapeau <cournape at gmail.com>wrote:

> On Sat, Feb 18, 2012 at 8:45 PM, Charles R Harris
> <charlesr.harris at gmail.com> wrote:
> >
> >
> > On Sat, Feb 18, 2012 at 1:39 PM, Matthew Brett <matthew.brett at gmail.com>
> > wrote:
> >>
> >> Hi,
> >>
> >> On Sat, Feb 18, 2012 at 12:35 PM, Charles R Harris
> >> <charlesr.harris at gmail.com> wrote:
> >> >
> >> >
> >> > On Sat, Feb 18, 2012 at 12:21 PM, Matthew Brett
> >> > <matthew.brett at gmail.com>
> >> > wrote:
> >> >>
> >> >> Hi.
> >> >>
> >> >> On Sat, Feb 18, 2012 at 12:18 AM, Christopher Jordan-Squire
> >> >> <cjordan1 at uw.edu> wrote:
> >> >> > On Fri, Feb 17, 2012 at 11:31 PM, Matthew Brett
> >> >> > <matthew.brett at gmail.com> wrote:
> >> >> >> Hi,
> >> >> >>
> >> >> >> On Fri, Feb 17, 2012 at 10:18 PM, Christopher Jordan-Squire
> >> >> >> <cjordan1 at uw.edu> wrote:
> >> >> >>> On Fri, Feb 17, 2012 at 8:30 PM, Sturla Molden <sturla at molden.no
> >
> >> >> >>> wrote:
> >> >> >>>>
> >> >> >>>>
> >> >> >>>> Den 18. feb. 2012 kl. 05:01 skrev Jason Grout
> >> >> >>>> <jason-sage at creativetrax.com>:
> >> >> >>>>
> >> >> >>>>> On 2/17/12 9:54 PM, Sturla Molden wrote:
> >> >> >>>>>> We would have to write a C++ programming tutorial that is
> based
> >> >> >>>>>> on
> >> >> >>>>>> Pyton knowledge instead of C knowledge.
> >> >> >>>>>
> >> >> >>>>> I personally would love such a thing.  It's been a while since
> I
> >> >> >>>>> did
> >> >> >>>>> anything nontrivial on my own in C++.
> >> >> >>>>>
> >> >> >>>>
> >> >> >>>> One example: How do we code multiple return values?
> >> >> >>>>
> >> >> >>>> In Python:
> >> >> >>>> - Return a tuple.
> >> >> >>>>
> >> >> >>>> In C:
> >> >> >>>> - Use pointers (evilness)
> >> >> >>>>
> >> >> >>>> In C++:
> >> >> >>>> - Return a std::tuple, as you would in Python.
> >> >> >>>> - Use references, as you would in Fortran or Pascal.
> >> >> >>>> - Use pointers, as you would in C.
> >> >> >>>>
> >> >> >>>> C++ textbooks always pick the last...
> >> >> >>>>
> >> >> >>>> I would show the first and the second method, and perhaps
> >> >> >>>> intentionally forget the last.
> >> >> >>>>
> >> >> >>>> Sturla
> >> >> >>>>
> >> >> >>
> >> >> >>> On the flip side, cython looked pretty...but I didn't get the
> >> >> >>> performance gains I wanted, and had to spend a lot of time
> figuring
> >> >> >>> out if it was cython, needing to add types, buggy support for
> >> >> >>> numpy,
> >> >> >>> or actually the algorithm.
> >> >> >>
> >> >> >> At the time, was the numpy support buggy?  I personally haven't
> had
> >> >> >> many problems with Cython and numpy.
> >> >> >>
> >> >> >
> >> >> > It's not that the support WAS buggy, it's that it wasn't clear to
> me
> >> >> > what was going on and where my performance bottleneck was. Even
> after
> >> >> > microbenchmarking with ipython, using timeit and prun, and using
> the
> >> >> > cython code visualization tool. Ultimately I don't think it was
> >> >> > cython, so perhaps my comment was a bit unfair. But it was
> >> >> > unfortunately difficult to verify that. Of course, as you say,
> >> >> > diagnosing and solving such issues would become easier to resolve
> >> >> > with
> >> >> > more cython experience.
> >> >> >
> >> >> >>> The C files generated by cython were
> >> >> >>> enormous and difficult to read. They really weren't meant for
> human
> >> >> >>> consumption.
> >> >> >>
> >> >> >> Yes, it takes some practice to get used to what Cython will do,
> and
> >> >> >> how to optimize the output.
> >> >> >>
> >> >> >>> As Sturla has said, regardless of the quality of the
> >> >> >>> current product, it isn't stable.
> >> >> >>
> >> >> >> I've personally found it more or less rock solid.  Could you say
> >> >> >> what
> >> >> >> you mean by "it isn't stable"?
> >> >> >>
> >> >> >
> >> >> > I just meant what Sturla said, nothing more:
> >> >> >
> >> >> > "Cython is still 0.16, it is still unfinished. We cannot base NumPy
> >> >> > on
> >> >> > an unfinished compiler."
> >> >>
> >> >> Y'all mean, it has a zero at the beginning of the version number and
> >> >> it is still adding new features?  Yes, that is correct, but it seems
> >> >> more reasonable to me to phrase that as 'active development' rather
> >> >> than 'unstable', because they take considerable care to be backwards
> >> >> compatible, have a large automated Cython test suite, and a major
> >> >> stress-tester in the Sage test suite.
> >> >>
> >> >
> >> > Matthew,
> >> >
> >> > No one in their right mind would build a large performance library
> using
> >> > Cython, it just isn't the right tool. For what it was designed for -
> >> > wrapping existing c code or writing small and simple things close to
> >> > Python
> >> > - it does very well, but it was never designed for making core C/C++
> >> > libraries and in that role it just gets in the way.
> >>
> >> I believe the proposal is to refactor the lowest levels in pure C and
> >> move the some or most of the library superstructure to Cython.
> >
> >
> > Go for it.
>
> The proposal of moving to a core C + cython has been discussed by
> multiple contributors. It is certainly a valid proposal. *I* have
> worked on this (npymath, separate compilation), although certainly not
> as much as I would have wanted to. I think much can be done in that
> vein. Using the "shut up if you don't do it" is a straw man (and
> uncalled for).
>

OK, I was annoyed.

>
> Moving away from subjective considerations on how to do things, is
> there a way that one can see the pros/cons of each approach. For the
> C++ approach, I would really like to see which C++ is being
> considered. I was. Once the choice is done, going back would be quite
> hard, so I can't see how we could go for it just because some people
> prefer it without very clear technical arguments.
>

Well, we already have code obfuscation (DOUBLE_your_pleasure,
FLOAT_your_boat), so we might as well let the compiler handle it. Having
classes, lists, and iterators would be a big plus. The current code is
really a kludge trying to make C look like C++. Not inherently bad, the
original C++ (C with classes), was a preprocessor that generated C code. I
really think the best arguments against C++ is portability and I think that
needs to be evaluated. But in many ways it supports the sort of things the
Numpy C code does in a natural way.

I'll let Mark expand on the virtues if he is so inclined, but C++ code
offers a higher level of abstraction that is very useful and allows good
reuse of properly constructed tools. The emphasis here on 'properly'. There
is certainly bad C++ code out there.

> Saying that C++ is more readable, or scale better are frankly very
> weak and too subjective to be convincing. There are too many projects
> way more complex than numpy that have been done in either C or C++.
>
>
To some extent that is experience based. And to another extent, it is a
question of what language people like to develop in. I myself would prefer
C++. The main thing I really don't like about C++ is IO. But Boost offers
some relief for that. I expect we will use small bits of Boost that can be
excised without problems from the bigger library.

I don't think we can count on C++11 at this point, so we would probably be
conservative in our choice of features.

Jim Hugunin was a keynote speaker at one of the scipy conventions. At
dinner he said that if he was to do it again he would use managed code ;) I
don't propose we do that, but tools do advance.

Chuck
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20120218/73b9e27e/attachment.html>