[SciPy-Dev] Proposal for a new function nanpdist that treats NaNs as missing values

Moritz Beber moritz.beber at gmail.com
Wed Aug 13 11:08:35 EDT 2014


Dear all,

As suggested in this github issue (
https://github.com/scipy/scipy/issues/3870), I would like to discuss the
merit of introducing a new function nanpdist into scipy.spatial. I have
also brought up the problem in the following previous e-mail (
http://comments.gmane.org/gmane.comp.python.scientific.devel/18956) and on
SO (
http://stackoverflow.com/questions/24781461/compute-the-pairwise-distance-in-scipy-with-missing-values
).

Warren suggested three ways to tackle this problem:

   1. Don't change anything--the users should clean up their data!
   2. nanpdist
   3. Add a keyword argument to pdist that determines how nan should be
   treated.

Clearly, I don't favor the first option since I believe missing values can
be important pieces of information, too. I slightly tend towards option two
because adding a keyword will further complicate an already very long pdist
function.

I'm happy to submit a pull request if there is a consensus that something
should be done.

Best,

Moritz
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scipy-dev/attachments/20140813/7c0cae10/attachment.html>


More information about the SciPy-Dev mailing list