[SciPy-dev] parallelizing cKDTRee

Wed Jan 7 10:55:38 EST 2009

2009/1/7 Sturla Molden <sturla at molden.no>:

> Speed is very important when searching kd-trees; otherwise we should not
> be using kd-trees but brute force. Thus exploiting multiple processors
> are important as well.

Very sensible.

> 1. Multiprocessing:
> Must add support for pickling and unpickling to cKDTree (i.e. __reduce__
> and __setstate__ methods). This would be useful for saving to disk as well.

This would be a good idea. One labor-saving device would be to take
advantage of the fact that construction is pretty cheap and have the
pickle method pickle only the construction parameters and the
underlying array. Not that pickling the tree structure would be very
hard either. One could also implement conversion beween C and python
kdtrees, which would make it easier to modify existing kdtrees or
implement new algorithms.

> 2. Multithreading (Python):
> cKDTree.query calls cKDTree.__query with the GIL released (i.e. a 'with
> nogil:' block). I think this will be safe.

This looks very easy and should pose no problems.

> 3. Multithreading (Cython):
> We could simply call cKDTree.__query in parallel using OpenMP pragmas.
> It would be a simple and quite portable hack.

Are there compilation issues with using OpenMP? Otherwise this should
be fairly easy too, although selecting the right degree of parallelism
may be a problem (I have no experience with OpenMP).

> Which do you prefer? All three?

I think all three are good ideas; I'd start with the low-hanging
fruit, which is just releasing the GIL. Then pickling, which is useful
for other purposes, and then OpenMP if it's useful.

I won't have time to do anything about this for a week or two.

Anne