[Numpy-discussion] My GSoC Proposal to Implement a Subset of NumPy for PyPy

Dag Sverre Seljebotn dagss at student.matnat.uio.no
Sat Apr 17 03:12:50 EDT 2010


Dan Roberts wrote:
> Hello NumPy Users,
>     Hi everybody, my name is Dan Roberts, and my Google Summer of Code 
> proposal was categorized under NumPy rather than PyPy, so it will end up 
> being reviewed by mentors for the NumPy project.  I'd like to take this 
> chance to introduce myself and my proposal.
>     I hadn't prepared for review by the NumPy mentors, but this can make 
> my proposal stronger than before.  With a bit of help from all of you, I 
> can dedicate my summer to creating more useful code than I would have 
> previously. I realize that from the perspective of NumPy, my proposal 
> might seem lacking, so I'd like to also invite the scrutiny of all of 
> the readers of this list.
>     Why should we bother reimplimenting anything?  PyPy, for those who 
> are unfamiliar, has the ability to Just-in-Time compile itself and 
> programs that it's running.  One of the major advantages of this is that 
> code operating on NumPy arrays could potentially be written in 
> pure-python, with normal looping constructs, and be nearly as fast as a 
> ufunc painstakingly crafted in C.  I'd love to see as much Python and as 
> little C as possible, and I'm sure I'm not alone in that wish.
>     A short introduction: I've been coding in Python for the past few 
> years, and have increasingly become interested in speeding up what has 
> become my favorite language. To that end I've become interested in both 
> the PyPy project and the NumPy projects. I've spent a fair amount of 
> time frustrating the PyPy developers with silly questions, written a bit 
> of code for them, and now my GSoC proposal involves both them, and 
> NumPy.    
>     Finally, I'd like to ask all of you: what features are most 
> important to you? It's not practical, wise, or even possible for me to 
> reimpliment more than a small portion of NumPy, but if I can address the 
> most important parts, maybe I can make this project useful enough for 
> some of you to use, and close enough for the rest of you that I can drum 
> up some support for more development in the future.
>      My proposal lives at http://codespeak.net/~dan/gsoc/micronumpy.html 
> thanks for making it this far through my long winded introduction!  I 
> welcome all constructive criticism and thoughts.

I'm curious about what role natively compiled code in C would play in 
your project. Would you use BLAS, or would you reimplement e.g. matrix 
multiplication in RPython and hope that PyPy optimize it? (Hint: It 
stands no chance of even coming close. A BLAS implementation is easily 
4-5 times faster (or more) than a simple hand-written C code for matrix 
multiplication, which I assume is the lower bound for any RPython code 
it is realistic to write. They use CPU-specific cache-aware algorithms 
which you really can't hope to implement yourself.)

Eventually, for this to be at all useful for the NumPy crowd, one has to 
make available eigenvalue finders, FFTs, and so on as well. This is a 
massive amount of work unless one is willing to connect to existing C 
implementations.

So even if all of this doesn't happen in the GSoC project, it would be 
useful to know whether it is possible long-term to connect with BLAS and 
LAPACK, or whether you intend everything to be done in RPython.

In my opinion, the *primary* reason Python is used for scientific 
programming rather than some other language is how easy it is to connect 
with C, C++ and Fortran code in CPython. That's something to keep in mind.

-- 
Dag Sverre



More information about the NumPy-Discussion mailing list