[Numpy-discussion] My GSoC Proposal to Implement a Subset of NumPy for PyPy

Stéfan van der Walt stefan at sun.ac.za
Sat Apr 17 03:24:40 EDT 2010


Hi Dan

On 17 April 2010 06:50, Dan Roberts <ademan555 at gmail.com> wrote:
>     Hi everybody, my name is Dan Roberts, and my Google Summer of Code
> proposal was categorized under NumPy rather than PyPy, so it will end up
> being reviewed by mentors for the NumPy project.  I'd like to take this
> chance to introduce myself and my proposal.

Thanks for the introduction, and welcome to NumPy!

>     I hadn't prepared for review by the NumPy mentors, but this can make my
> proposal stronger than before.  With a bit of help from all of you, I can
> dedicate my summer to creating more useful code than I would have
> previously. I realize that from the perspective of NumPy, my proposal might
> seem lacking, so I'd like to also invite the scrutiny of all of the readers
> of this list.

This proposal builds a bridge between two projects, so even if it
technically falls under the NumPy banner, we'll lean heavily on Maciej
Fijalkowski from PyPy for guidance.

>     Why should we bother reimplimenting anything?  PyPy, for those who are
> unfamiliar, has the ability to Just-in-Time compile itself and programs that
> it's running.  One of the major advantages of this is that code operating on
> NumPy arrays could potentially be written in pure-python, with normal
> looping constructs, and be nearly as fast as a ufunc painstakingly crafted
> in C.  I'd love to see as much Python and as little C as possible, and I'm
> sure I'm not alone in that wish.

Your code has a fairly specialised application and it's worth
discussing exactly where it would fit in.  For example, from our
perspective rewriting things such as zeros(), ones(), etc. is not of
much interest.  However, the ability to whip up fast ufuncs and
generalised ufuncs is in great demand.  Also, it is sometimes clearer
to express an algorithm as

for i in range(n):
    for j in range(m):
        x[i, j] = some_op(x[i, j])

instead of vectorising the code.  Here, PyPy can provide a big speed
improvement.  I'm not sure, but it sounds like the "interface" you
refer to would be things such as the [] operator on arrays, for
example?

Just an an aside, I think PyPy would be perfect for managing sparse
matrices (such as scipy.sparse), where there are so many loops
involved---in fact, an RPy implementation of scipy.sparse could be an
interesting proposal for a next SoC!.

I spoke briefly with Maciej the other day, and I realised that there
is a lot of detail on how PyPy interacts with C modules that we are
not aware of.  It would be great if you could elaborate a bit on the
way PyPy is able to access current C functionality.  For example, can
you use NumPy as is, and just replace functionality piece by piece, or
would you need to rewrite a large part of the interface at a time?

Regards
Stéfan



More information about the NumPy-Discussion mailing list