[Numpy-discussion] How a transition to C++ could work

Sun Feb 19 10:20:12 EST 2012

On Feb 19, 2012, at 2:18 AM, Mark Wiebe <mwwiebe at gmail.com> wrote:

The suggestion of transitioning the NumPy core code from C to C++ has
sparked a vigorous debate, and I thought I'd start a new thread to give my
perspective on some of the issues raised, and describe how such a
transition could occur.

First, I'd like to reiterate the gcc rationale for their choice to switch:
http://gcc.gnu.org/wiki/gcc-in-cxx#Rationale

In particular, these points deserve emphasis:

   - The C subset of C++ is just as efficient as C.
   - C++ supports cleaner code in several significant cases.
   - C++ makes it easier to write cleaner interfaces by making it harder to
   break interface boundaries.
   - C++ never requires uglier code.

Some people have pointed out that the Python templating preprocessor used
in NumPy is suggestive of C++ templates. A nice advantage of using C++
templates instead of this preprocessor is that third party tools to improve
software quality, like static analysis tools, will be able to run directly
on the NumPy source code. Additionally, IDEs like XCode and Visual C++ will
be able to provide the full suite of tab-completion/intellisense features
that programmers working in those environments are accustomed to.

There are concerns about ABI/API interoperability and interactions with C++
exceptions. I've dealt with these types of issues on enough platforms to
know that while they're important, they're a lot easier to handle than the
issues with Fortran, BLAS, and LAPACK in SciPy. My experience has been that
providing a C API from a C++ library is no harder than providing a C API
from a C library.

It's worth comparing the possibility of C++ versus the possibility of other
languages, and the ones that have been suggested for consideration are D,
Cython, Rust, Fortran 2003, Go, RPython, C# and Java. The target language
has to interact naturally with the CPython API. It needs to provide direct
access to all the various sizes of signed int, unsigned int, and float. It
needs to have mature compiler support wherever we want to deploy NumPy.
Taken together, these requirements eliminate a majority of these
possibilities. From these criteria, the only languages which seem to have a
clear possibility for the implementation of Numpy are C, C++, and D. For D,
I suspect the tooling is not mature enough, but I'm not 100% certain of
that.

I am a huge fan of D, but you are dead on about its tooling, so +1 on the
observation. Its code generation especially with respect to floating point
is also a known area needing improvement IIRC.

The biggest question for any of these possibilities is how do you get the
code from its current state to a state which fully utilizes the target
language. C++, being nearly a superset of C, offers a strategy to gradually
absorb C++ features. Any of the other language choices requires a rewrite,
which would be quite disruptive. Because of all these reasons taken
together, I believe the only realistic language to use, other than sticking
with C, is C++.

Finally, here's what I think is the best strategy for transitioning to C++.
First, let's consider what we do if 1.7 becomes an LTS release.

1) Immediately after branching for 1.7, we minimally patch all the .c files
so that they can build with a C++ compiler and with a C compiler at the
same time. Then we rename all .c -> .cpp, and update the build systems for
C++.
2) During the 1.8 development cycle, we heavily restrict C++ feature usage.
But, where a feature implementation would be arguably easier and less
error-prone with C++, we allow it. This is a period for learning about C++
and how it can benefit NumPy.
3) After the 1.8 release, the community will have developed more experience
with C++, and will be in a better position to discuss a way forward.

If, for some reason, a 1.7 LTS is unacceptable, it might be a good idea to
restrict the 1.8 release to the subset of both C and C++. I would much
prefer using the 1.8 development cycle to dip our toes into the C++ world
to get some of the low-hanging benefits without doing anything disruptive.

A really important point to emphasize is that C++ allows for a strategy
where we gradually evolve the codebase to better incorporate its language
features. This is what I'm advocating. No massive rewrite, no disruptive
changes. Gradual code evolution, with ABI and API compatibility comparable
to what we've delivered in 1.6 and the upcoming 1.7 releases.

Thanks,
Mark

_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion at scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20120219/19ed035e/attachment.html>