[Numpy-discussion] Compare NumPy arrays with threshold

Nissim Derdiger NissimD at elspec-ltd.com
Thu May 18 08:07:07 EDT 2017


Hi again,
Thanks for the responses to my question!
Roberts answer worked very well for me, except for 1 small issue:

This line:
close_mask = np.isclose(MatA, MatB, Threshold, equal_nan=True)
returns each difference twice - once j in compare to I and once for I in compare to j

for example:

for this input:
MatA = [[10,20,30],[40,50,60]]
MatB = [[10,30,30],[40,50,160]]

My old code will return:
0,1,20,30
1,3,60,160
You code returns:
0,1,20,30
1,3,60,160
0,1,30,20
1,3,160,60


I can simply cut "close_mask" to half so I'll have only 1 iteration, but that does not seems to be efficient..
any ideas?



Also, what should I change to support 3D arrays as well?


Thanks again,
Nissim.




-----Original Message-----
From: NumPy-Discussion [mailto:numpy-discussion-bounces+nissimd=elspec-ltd.com at python.org] On Behalf Of numpy-discussion-request at python.org
Sent: Wednesday, May 17, 2017 8:17 PM
To: numpy-discussion at python.org
Subject: NumPy-Discussion Digest, Vol 128, Issue 18

Send NumPy-Discussion mailing list submissions to
        numpy-discussion at python.org<mailto:numpy-discussion at python.org>

To subscribe or unsubscribe via the World Wide Web, visit
        https://mail.python.org/mailman/listinfo/numpy-discussion
or, via email, send a message with subject or body 'help' to
        numpy-discussion-request at python.org<mailto:numpy-discussion-request at python.org>

You can reach the person managing the list at
        numpy-discussion-owner at python.org<mailto:numpy-discussion-owner at python.org>

When replying, please edit your Subject line so it is more specific than "Re: Contents of NumPy-Discussion digest..."


Today's Topics:

   1. Compare NumPy arrays with threshold and return the
      differences (Nissim Derdiger)
   2. Re: Compare NumPy arrays with threshold and return the
      differences (Paul Hobson)
   3. Re: Compare NumPy arrays with threshold and return the
      differences (Robert Kern)


----------------------------------------------------------------------

Message: 1
Date: Wed, 17 May 2017 16:50:40 +0000
From: Nissim Derdiger <NissimD at elspec-ltd.com<mailto:NissimD at elspec-ltd.com>>
To: "numpy-discussion at python.org<mailto:numpy-discussion at python.org>" <numpy-discussion at python.org<mailto:numpy-discussion at python.org>>
Subject: [Numpy-discussion] Compare NumPy arrays with threshold and
        return the differences
Message-ID:
        <9EFE3345170EF24DB67C61C1B05EEEDB4073F384 at EX10.Elspec.local<mailto:9EFE3345170EF24DB67C61C1B05EEEDB4073F384 at EX10.Elspec.local>>
Content-Type: text/plain; charset="us-ascii"

Hi,

In my script, I need to compare big NumPy arrays (2D or 3D), and return a list of all cells with difference bigger than a defined threshold.
The compare itself can be done easily done with "allclose" function, like that:
Threshold = 0.1
if (np.allclose(Arr1, Arr2, Threshold, equal_nan=True)):
    Print('Same')

But this compare does not return which cells are not the same.

The easiest (yet naive) way to know which cells are not the same is to use a simple for loops code like this one:
def CheckWhichCellsAreNotEqualInArrays(Arr1,Arr2,Threshold):
   if not Arr1.shape == Arr2.shape:
       return ['Arrays size not the same']
   Dimensions = Arr1.shape
   Diff = []
   for i in range(Dimensions [0]):
       for j in range(Dimensions [1]):
           if not np.allclose(Arr1[i][j], Arr2[i][j], Threshold, equal_nan=True):
               Diff.append(',' + str(i) + ',' + str(j) + ',' + str(Arr1[i,j]) + ','
               + str(Arr2[i,j]) + ',' + str(Threshold) + ',Fail\n')
       return Diff

(and same for 3D arrays - with 1 more for loop) This way is very slow when the Arrays are big and full of none-equal cells.

Is there a fast straight forward way in case they are not the same - to get a list of the uneven cells? maybe some built-in function in the NumPy itself?
Thanks!
Nissim


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20170517/a8bfd324/attachment-0001.html>

------------------------------

Message: 2
Date: Wed, 17 May 2017 10:13:46 -0700
From: Paul Hobson <pmhobson at gmail.com<mailto:pmhobson at gmail.com>>
To: Discussion of Numerical Python <numpy-discussion at python.org<mailto:numpy-discussion at python.org>>
Subject: Re: [Numpy-discussion] Compare NumPy arrays with threshold
        and return the differences
Message-ID:
        <CADT3MEABot==+z_iL7qkzim0rDM+0hN4kP4W-veKeoqEW2pDrA at mail.gmail.com<mailto:CADT3MEABot==+z_iL7qkzim0rDM+0hN4kP4W-veKeoqEW2pDrA at mail.gmail.com>>
Content-Type: text/plain; charset="utf-8"

I would do something like:

diff_is_large = (array1 - array2) > threshold index_at_large_diff = numpy.nonzero(diff_is_large)
array1[index_at_large_diff].tolist()


On Wed, May 17, 2017 at 9:50 AM, Nissim Derdiger <NissimD at elspec-ltd.com<mailto:NissimD at elspec-ltd.com>>
wrote:

> Hi,
> In my script, I need to compare big NumPy arrays (2D or 3D), and
> return a list of all cells with difference bigger than a defined threshold.
> The compare itself can be done easily done with "allclose" function,
> like
> that:
> Threshold = 0.1
> if (np.allclose(Arr1, Arr2, Threshold, equal_nan=True)):
>     Print('Same')
> But this compare does not return *which* cells are not the same.
>
> The easiest (yet naive) way to know which cells are not the same is to
> use a simple for loops code like this one:
> def CheckWhichCellsAreNotEqualInArrays(Arr1,Arr2,Threshold):
>    if not Arr1.shape == Arr2.shape:
>        return ['Arrays size not the same']
>    Dimensions = Arr1.shape
>    Diff = []
>    for i in range(Dimensions [0]):
>        for j in range(Dimensions [1]):
>            if not np.allclose(Arr1[i][j], Arr2[i][j], Threshold,
> equal_nan=True):
>                Diff.append(',' + str(i) + ',' + str(j) + ',' +
> str(Arr1[i,j]) + ','
>                + str(Arr2[i,j]) + ',' + str(Threshold) + ',Fail\n')
>        return Diff
> (and same for 3D arrays - with 1 more for loop) This way is very slow
> when the Arrays are big and full of none-equal cells.
>
> Is there a fast straight forward way in case they are not the same -
> to get a list of the uneven cells? maybe some built-in function in the
> NumPy itself?
> Thanks!
> Nissim
>
>
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org<mailto:NumPy-Discussion at python.org>
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20170517/6183339c/attachment-0001.html>

------------------------------

Message: 3
Date: Wed, 17 May 2017 10:16:09 -0700
From: Robert Kern <robert.kern at gmail.com<mailto:robert.kern at gmail.com>>
To: Discussion of Numerical Python <numpy-discussion at python.org<mailto:numpy-discussion at python.org>>
Subject: Re: [Numpy-discussion] Compare NumPy arrays with threshold
        and return the differences
Message-ID:
        <CAF6FJisn3Oj18HOOP-DJGOi7rTwr-1U4npef+wCd=ENnMkMFmw at mail.gmail.com<mailto:CAF6FJisn3Oj18HOOP-DJGOi7rTwr-1U4npef+wCd=ENnMkMFmw at mail.gmail.com>>
Content-Type: text/plain; charset="utf-8"

On Wed, May 17, 2017 at 9:50 AM, Nissim Derdiger <NissimD at elspec-ltd.com<mailto:NissimD at elspec-ltd.com>>
wrote:

> Hi,
> In my script, I need to compare big NumPy arrays (2D or 3D), and
> return a list of all cells with difference bigger than a defined threshold.
> The compare itself can be done easily done with "allclose" function,
> like
> that:
> Threshold = 0.1
> if (np.allclose(Arr1, Arr2, Threshold, equal_nan=True)):
>     Print('Same')
> But this compare does not return *which* cells are not the same.
>
> The easiest (yet naive) way to know which cells are not the same is to
> use a simple for loops code like this one:
> def CheckWhichCellsAreNotEqualInArrays(Arr1,Arr2,Threshold):
>    if not Arr1.shape == Arr2.shape:
>        return ['Arrays size not the same']
>    Dimensions = Arr1.shape
>    Diff = []
>    for i in range(Dimensions [0]):
>        for j in range(Dimensions [1]):
>            if not np.allclose(Arr1[i][j], Arr2[i][j], Threshold,
> equal_nan=True):
>                Diff.append(',' + str(i) + ',' + str(j) + ',' +
> str(Arr1[i,j]) + ','
>                + str(Arr2[i,j]) + ',' + str(Threshold) + ',Fail\n')
>        return Diff
> (and same for 3D arrays - with 1 more for loop) This way is very slow
> when the Arrays are big and full of none-equal cells.
>
> Is there a fast straight forward way in case they are not the same -
> to get a list of the uneven cells? maybe some built-in function in the
> NumPy itself?
>

Use `close_mask = np.isclose(Arr1, Arr2, Threshold, equal_nan=True)` to return a boolean mask the same shape as the arrays which is True where the elements are close and False where they are not. You can invert it to get a boolean mask which is True where they are "far" with respect to the
threshold: `far_mask = ~close_mask`. Then you can use `i_idx, j_idx = np.nonzero(far_mask)` to get arrays of the `i` and `j` indices where the values are far. For example:

for i, j in zip(i_idx, j_idx):
    print("{0}, {1}, {2}, {3}, {4}, Fail".format(i, j, Arr1[i, j], Arr2[i, j], Threshold))

--
Robert Kern
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20170517/3d57f695/attachment.html>

------------------------------

Subject: Digest Footer

_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion at python.org<mailto:NumPy-Discussion at python.org>
https://mail.python.org/mailman/listinfo/numpy-discussion


------------------------------

End of NumPy-Discussion Digest, Vol 128, Issue 18
*************************************************

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20170518/d9808b4c/attachment-0001.html>


More information about the NumPy-Discussion mailing list