[Numpy-discussion] Improving Complex Comparison/Ordering in Numpy

Fri Jun 5 00:21:35 EDT 2020

Corresponding pandas issue:
https://github.com/pandas-dev/pandas/issues/28050

On Thu, Jun 4, 2020 at 9:17 PM Rakesh Vasudevan <rakesh.nvasudev at gmail.com>
wrote:

> Hi all,
>
> As a follow up to gh-15981 <https://github.com/numpy/numpy/issues/15981>,
> I would like to propose a change to bring complex dtype(s) comparison
> operators and related functions, in line with respective cpython
> implementations.
>
> The current state of complex dtype comparisons/ordering as summarised in
> the issue is as follows:
>
> # In python
>
> >> cnum = 1 + 2j
> >> cnum_two = 1 + 3j
>
> # Doing a comparision yields
> >> cnum > cnum_two
>
> TypeError: '>' not supported between instances of 'complex' and 'complex'
>
>
> # Doing the same in Numpy scalar comparision
>
> >> np.array(cnum) > np.array(cnum_two)
>
> # Yields
>
> False
>
>
> *NOTE*: only >, <, >= , <= do not work on complex numbers in python ,
> equality (==) does work
>
> similarly sorting uses comparison operators behind to sort complex values.
> Again this behavior diverges from the default python behavior.
>
> # In native python
> >> clist = [cnum, cnum_2]
> >> sorted(clist, key=lambda c: (c.real, c.imag))
> [(1+2j), (1+3j)]
>
> # In numpy
>
> >> np.sort(clist) #Uses the default comparision order
>
> # Yields same result
>
> # To get a cpython like sorting call we can do the following in numpy
> np.take_along_axis(clist, np.lexsort((clist.real, clist.imag), 0), 0)
>
>
> This proposal aims to bring parity between default python handling of
> complex numbers and handling complex types in numpy
>
> This is a two-step process
>
>
>    1. Sort complex numbers in a pythonic way , accepting key arguments,
>    and deprecate usage of sort() on complex numbers without key argument
>       1. Possibly extend this to max(), min(), if it makes sense to do
>       so.
>       2. Since sort() is being updated for complex numbers,
>       searchsorted() is also a good candidate for implementing this change.
>    2. Once this is done, we can deprecate the usage of comparison
>    operators (>, <, >= , <=) on complex dtypes
>
>
>
>
> *Handling sort() for complex numbers*
> There are two approaches we can take for this
>
>
>    1. update sort() method, to have a ‘key’ kwarg. When key value is
>    passed, use lexsort to get indices and continue sorting of it. We could
>    support lambda function keys like python, but that is likely to be very
>    slow.
>    2. Create a new wrapper function sort_by() (placeholder name,
>    Requesting name suggestions/feedback)That essentially acts like a syntactic
>    sugar for
>       1. np.take_along_axis(clist, np.lexsort((clist.real, clist.imag),
>       0), 0)
>
>
>    1. Improve the existing sort_complex() method with the new key search
>    functionality (Though the change will only reflect for complex dtypes).
>
> We could choose either method, both have pros and cons , approach 1 makes
> the sort function signature, closer to its python counterpart, while using
> approach 2 provides a better distinction between the two approaches for
> sorting. The performance on approach 1 function would vary, due to the key
> being an optional argument. Would love the community’s thoughts on this.
>
>
> *Handling min() and max() for complex numbers*
>
> Since min and max are essentially a set of comparisons, in python they are
> not allowed on complex numbers
>
> >> clist = [cnum, cnum_2]
> >>> min(clist)
> Traceback (most recent call last):
>   File "<stdin>", line 1, in <module>
> TypeError: '<' not supported between instances of 'complex' and 'complex'
>
> # But using keys argument again works
> min(clist, key=lambda c: (c.real, c.imag))
>
> We could use a similar key kwarg for min() and max() in python, but
> question remains how we handle the keys, in this use case , naive way would
> be to sort() on keys and take last or first element, which is likely going
> to be slow. Requesting suggestions on approaching this.
>
> *Comments on isclose()*
> Both python and numpy use the absolute value/magnitude for comparing if
> two values are close enough. Hence I do not see this change affecting this
> function.
>
> Requesting feedback and suggestions on the above.
>
> Thank you,
>
> Rakesh
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20200604/4f04dfb5/attachment-0001.html>