[SciPy-Dev] Enhancements to scipy.spatial.cKDTree

Sturla Molden sturla at molden.no
Fri Jul 13 07:02:09 EDT 2012


I've quickly looked at the code, most of it looks good.

Care if I help?

I'd like to focus on these changes first:

** Make sure the code behaves correctly on 64-bit. Currently there are 
numerous possibilities for integer overflow, as C int is used 
consistently instead of npy_intp or Py_ssize_t.

** Make sure malloc and free does not leak memory in case of a Python 
exception. This means either using try/finally religiously or wrap it in 
a Python object that use "RAII" with __cinit__, __init__ and 
__dealloc__. Currently it is littered with possible memory leaks.

** Raise MemoryError on malloc failure.

** Use PyMem_Malloc/PyMem_Free instead of libc, so we are sure to use 
the same heap as Python.

** Correct the "ndarray.data" issue, but I will refrain from using typed 
memoryviews to maintain compatibility with Python 2.4. That will 
probably mean to use the Python API directly (PyArrayObject and 
PyArray_DATA) instead of Cython's numpy.pxd.

** Use Cython's fused types (a bloatware generator) instead of C double 
to autogenerate cKDTrees for any NumPy dtype.

** Avoid using struct hacks to accomodate tree node structs of different 
sizes (or else we are only compatible with C99).

** Start to release the GIL wherever possible, as searching a kd-tree is 
expensive. (Not just for the sake of multi-threading, but also to avoid 
locking up the interpreter forever.)


Sturla





On 13.07.2012 09:09, Ralf Gommers wrote:
>
>
> On Fri, Jul 13, 2012 at 1:34 AM, Patrick Varilly <patvarilly at gmail.com
> <mailto:patvarilly at gmail.com>> wrote:
>
>     Alright, I've uploaded the last bit of cKDTree that was missing for
>     it to be functionally equivalent to KDTree.  As it stands, I think
>     it's a useful addition in its own right, so it would be nice if
>     someone else could look the code over and see if it can be merged in.
>
>     Over the coming weeks, I will look into the issues that Sturla has
>     brought up and see if I can make some progress on these.
>
>
> Please note that memoryviews can't be used yet in scipy due them not
> working with python 2.4 (see https://github.com/numpy/numpy/pull/307).
> All the other things Sturla mentioned do sound like improvements that
> can be made.
>
> Cheers,
> Ralf
>
>
>
>     All the best,
>
>     Patrick
>
>
>     On Thu, Jul 12, 2012 at 5:42 PM, Sturla Molden <sturla at molden.no
>     <mailto:sturla at molden.no>> wrote:
>
>         On 12.07.2012 00:26, Patrick Varilly wrote:
>          > On Tue, Jul 10, 2012 at 12:01 PM, Sturla Molden
>         <sturla at molden.no <mailto:sturla at molden.no>
>          > <mailto:sturla at molden.no <mailto:sturla at molden.no>>> wrote:
>          >
>          >     At least cKDTree have to be fixed, it will break as soon
>         as the move to
>          >     PyArray_DATA is mandatory.
>          >
>          >     Preferably we should use Cython memoryviews and
>         multidimensional arrays
>          >     in the code, instead of just C pointer artithmetics
>         (which is harder to
>          >     understand). That will make the Cython code more readable
>         to NumPy
>          >     users.
>          >
>          >     The GIL issue should also be fixed, as searching might
>         take a while.
>          >
>          > I'm relatively new to Cython.  Could you tell me where I
>         could read up
>          > on these issues?
>
>         The main issue is the use of the .data attribute. See here:
>
>         http://wiki.cython.org/tutorials/NumpyPointerToC
>
>         Another is that Cython's ndarray interface is (more or less)
>         deprecated
>         in favour of typed memoryviews:
>
>         http://docs.cython.org/src/userguide/memoryviews.html
>
>         So preferably the cKDTree code should use these, but I my experience
>         they can generate compile-time warnings.
>
>         There is also a 64-bit issue with cKDTree if I remember
>         correctly. And
>         the only dtype it supports is float64. We should replace the current
>         pointer artimetics with multidimensional arrays. It had (or
>         still has)
>         non-portable code like dependency on unions and binary layout
>         (tree and
>         heap nodes). And there the issue of making it release the GIL
>         whenever
>         it should. So several things needs be fixed.
>
>         Sturla
>         _______________________________________________
>         SciPy-Dev mailing list
>         SciPy-Dev at scipy.org <mailto:SciPy-Dev at scipy.org>
>         http://mail.scipy.org/mailman/listinfo/scipy-dev
>
>
>
>     _______________________________________________
>     SciPy-Dev mailing list
>     SciPy-Dev at scipy.org <mailto:SciPy-Dev at scipy.org>
>     http://mail.scipy.org/mailman/listinfo/scipy-dev
>
>
>
>
> _______________________________________________
> SciPy-Dev mailing list
> SciPy-Dev at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-dev




More information about the SciPy-Dev mailing list