[SciPy-dev] Question: general heap implementation versus specific?

Anne Archibald peridot.faceted at gmail.com
Mon Mar 2 23:24:03 EST 2009


2009/3/2 Thouis (Ray) Jones <thouis at broad.mit.edu>:
> Lee Kamentsky has been working on a replacement for the watershed
> image transformation in scipy (the current one isn't exactly broken,
> but behaves in unexpected ways for some images).  As part of this work
> he implemented heaps in cython, specific to this problem (integers, 4
> entries per heap element).  I spent some time generalizing it to
> arbitrary number of entries per element, with the thought that it
> could be of use elsewhere in scipy.
>
> I wonder now if that was worthwhile.  The performance cost of the
> generalization isn't horrible, but not negligible, and I wonder if it
> the code can actually be used elsewhere effectively.  For instance,
> the recent kdTree code has a heap implementation as well, but I don't
> think it could take advantage of this code (I haven't looked too
> closely).

I wrote the heap implementation in cKDTree, and I wrestled with the
same question. As JWZ put it in the XKeyCaps source "I'd just like to
take this moment to point out that C has all the expressive power of
two dixie cups and a string." The problem is really one of writing
generic data structures in C.

For a heap, I see two relevant questions: what is the type of the
keys? and how big are the objects being stored? For the type of the
keys, you clearly want different implementations for each of int,
short int, float, double, user-defined type (with comparison
function). If the stored objects are small - ints or the like - you
can stow them in the heap directly, copying when you have to move heap
objects around. If they're large, then you want to store pointers to
them, and have some well-defined memory allocation and deallocation
syntax. (The cKDTree code deals with this an an ad-hoc manner with a
union.)

While C++ templates should be able to help with this sort of thing, I
think as long as one is sticking to C and to cython, there's no good
solution to making a general-enough library. Of course, if you want a
fully-flexible heap at the python level, that exists already. But I
think, annoying as it is, we're stuck reimplementing heaps.

That said, I suspect that with a little care it might be possible to
merge the heap implementations between cKDTree and the image
transformations.

Anne

> Any opinions or advice on the relative cost of generality and
> simplicity versus performance?
>
> I've attached the current heap implementation for reference.
>
> Ray Jones
>
> _______________________________________________
> Scipy-dev mailing list
> Scipy-dev at scipy.org
> http://projects.scipy.org/mailman/listinfo/scipy-dev
>
>



More information about the SciPy-Dev mailing list