[Numpy-discussion] Compare NumPy arrays with threshold
Nissim Derdiger
NissimD at elspec-ltd.com
Thu May 18 08:07:07 EDT 2017
Hi again,
Thanks for the responses to my question!
Roberts answer worked very well for me, except for 1 small issue:
This line:
close_mask = np.isclose(MatA, MatB, Threshold, equal_nan=True)
returns each difference twice - once j in compare to I and once for I in compare to j
for example:
for this input:
MatA = [[10,20,30],[40,50,60]]
MatB = [[10,30,30],[40,50,160]]
My old code will return:
0,1,20,30
1,3,60,160
You code returns:
0,1,20,30
1,3,60,160
0,1,30,20
1,3,160,60
I can simply cut "close_mask" to half so I'll have only 1 iteration, but that does not seems to be efficient..
any ideas?
Also, what should I change to support 3D arrays as well?
Thanks again,
Nissim.
-----Original Message-----
From: NumPy-Discussion [mailto:numpy-discussion-bounces+nissimd=elspec-ltd.com at python.org] On Behalf Of numpy-discussion-request at python.org
Sent: Wednesday, May 17, 2017 8:17 PM
To: numpy-discussion at python.org
Subject: NumPy-Discussion Digest, Vol 128, Issue 18
Send NumPy-Discussion mailing list submissions to
numpy-discussion at python.org<mailto:numpy-discussion at python.org>
To subscribe or unsubscribe via the World Wide Web, visit
https://mail.python.org/mailman/listinfo/numpy-discussion
or, via email, send a message with subject or body 'help' to
numpy-discussion-request at python.org<mailto:numpy-discussion-request at python.org>
You can reach the person managing the list at
numpy-discussion-owner at python.org<mailto:numpy-discussion-owner at python.org>
When replying, please edit your Subject line so it is more specific than "Re: Contents of NumPy-Discussion digest..."
Today's Topics:
1. Compare NumPy arrays with threshold and return the
differences (Nissim Derdiger)
2. Re: Compare NumPy arrays with threshold and return the
differences (Paul Hobson)
3. Re: Compare NumPy arrays with threshold and return the
differences (Robert Kern)
----------------------------------------------------------------------
Message: 1
Date: Wed, 17 May 2017 16:50:40 +0000
From: Nissim Derdiger <NissimD at elspec-ltd.com<mailto:NissimD at elspec-ltd.com>>
To: "numpy-discussion at python.org<mailto:numpy-discussion at python.org>" <numpy-discussion at python.org<mailto:numpy-discussion at python.org>>
Subject: [Numpy-discussion] Compare NumPy arrays with threshold and
return the differences
Message-ID:
<9EFE3345170EF24DB67C61C1B05EEEDB4073F384 at EX10.Elspec.local<mailto:9EFE3345170EF24DB67C61C1B05EEEDB4073F384 at EX10.Elspec.local>>
Content-Type: text/plain; charset="us-ascii"
Hi,
In my script, I need to compare big NumPy arrays (2D or 3D), and return a list of all cells with difference bigger than a defined threshold.
The compare itself can be done easily done with "allclose" function, like that:
Threshold = 0.1
if (np.allclose(Arr1, Arr2, Threshold, equal_nan=True)):
Print('Same')
But this compare does not return which cells are not the same.
The easiest (yet naive) way to know which cells are not the same is to use a simple for loops code like this one:
def CheckWhichCellsAreNotEqualInArrays(Arr1,Arr2,Threshold):
if not Arr1.shape == Arr2.shape:
return ['Arrays size not the same']
Dimensions = Arr1.shape
Diff = []
for i in range(Dimensions [0]):
for j in range(Dimensions [1]):
if not np.allclose(Arr1[i][j], Arr2[i][j], Threshold, equal_nan=True):
Diff.append(',' + str(i) + ',' + str(j) + ',' + str(Arr1[i,j]) + ','
+ str(Arr2[i,j]) + ',' + str(Threshold) + ',Fail\n')
return Diff
(and same for 3D arrays - with 1 more for loop) This way is very slow when the Arrays are big and full of none-equal cells.
Is there a fast straight forward way in case they are not the same - to get a list of the uneven cells? maybe some built-in function in the NumPy itself?
Thanks!
Nissim
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20170517/a8bfd324/attachment-0001.html>
------------------------------
Message: 2
Date: Wed, 17 May 2017 10:13:46 -0700
From: Paul Hobson <pmhobson at gmail.com<mailto:pmhobson at gmail.com>>
To: Discussion of Numerical Python <numpy-discussion at python.org<mailto:numpy-discussion at python.org>>
Subject: Re: [Numpy-discussion] Compare NumPy arrays with threshold
and return the differences
Message-ID:
<CADT3MEABot==+z_iL7qkzim0rDM+0hN4kP4W-veKeoqEW2pDrA at mail.gmail.com<mailto:CADT3MEABot==+z_iL7qkzim0rDM+0hN4kP4W-veKeoqEW2pDrA at mail.gmail.com>>
Content-Type: text/plain; charset="utf-8"
I would do something like:
diff_is_large = (array1 - array2) > threshold index_at_large_diff = numpy.nonzero(diff_is_large)
array1[index_at_large_diff].tolist()
On Wed, May 17, 2017 at 9:50 AM, Nissim Derdiger <NissimD at elspec-ltd.com<mailto:NissimD at elspec-ltd.com>>
wrote:
> Hi,
> In my script, I need to compare big NumPy arrays (2D or 3D), and
> return a list of all cells with difference bigger than a defined threshold.
> The compare itself can be done easily done with "allclose" function,
> like
> that:
> Threshold = 0.1
> if (np.allclose(Arr1, Arr2, Threshold, equal_nan=True)):
> Print('Same')
> But this compare does not return *which* cells are not the same.
>
> The easiest (yet naive) way to know which cells are not the same is to
> use a simple for loops code like this one:
> def CheckWhichCellsAreNotEqualInArrays(Arr1,Arr2,Threshold):
> if not Arr1.shape == Arr2.shape:
> return ['Arrays size not the same']
> Dimensions = Arr1.shape
> Diff = []
> for i in range(Dimensions [0]):
> for j in range(Dimensions [1]):
> if not np.allclose(Arr1[i][j], Arr2[i][j], Threshold,
> equal_nan=True):
> Diff.append(',' + str(i) + ',' + str(j) + ',' +
> str(Arr1[i,j]) + ','
> + str(Arr2[i,j]) + ',' + str(Threshold) + ',Fail\n')
> return Diff
> (and same for 3D arrays - with 1 more for loop) This way is very slow
> when the Arrays are big and full of none-equal cells.
>
> Is there a fast straight forward way in case they are not the same -
> to get a list of the uneven cells? maybe some built-in function in the
> NumPy itself?
> Thanks!
> Nissim
>
>
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org<mailto:NumPy-Discussion at python.org>
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20170517/6183339c/attachment-0001.html>
------------------------------
Message: 3
Date: Wed, 17 May 2017 10:16:09 -0700
From: Robert Kern <robert.kern at gmail.com<mailto:robert.kern at gmail.com>>
To: Discussion of Numerical Python <numpy-discussion at python.org<mailto:numpy-discussion at python.org>>
Subject: Re: [Numpy-discussion] Compare NumPy arrays with threshold
and return the differences
Message-ID:
<CAF6FJisn3Oj18HOOP-DJGOi7rTwr-1U4npef+wCd=ENnMkMFmw at mail.gmail.com<mailto:CAF6FJisn3Oj18HOOP-DJGOi7rTwr-1U4npef+wCd=ENnMkMFmw at mail.gmail.com>>
Content-Type: text/plain; charset="utf-8"
On Wed, May 17, 2017 at 9:50 AM, Nissim Derdiger <NissimD at elspec-ltd.com<mailto:NissimD at elspec-ltd.com>>
wrote:
> Hi,
> In my script, I need to compare big NumPy arrays (2D or 3D), and
> return a list of all cells with difference bigger than a defined threshold.
> The compare itself can be done easily done with "allclose" function,
> like
> that:
> Threshold = 0.1
> if (np.allclose(Arr1, Arr2, Threshold, equal_nan=True)):
> Print('Same')
> But this compare does not return *which* cells are not the same.
>
> The easiest (yet naive) way to know which cells are not the same is to
> use a simple for loops code like this one:
> def CheckWhichCellsAreNotEqualInArrays(Arr1,Arr2,Threshold):
> if not Arr1.shape == Arr2.shape:
> return ['Arrays size not the same']
> Dimensions = Arr1.shape
> Diff = []
> for i in range(Dimensions [0]):
> for j in range(Dimensions [1]):
> if not np.allclose(Arr1[i][j], Arr2[i][j], Threshold,
> equal_nan=True):
> Diff.append(',' + str(i) + ',' + str(j) + ',' +
> str(Arr1[i,j]) + ','
> + str(Arr2[i,j]) + ',' + str(Threshold) + ',Fail\n')
> return Diff
> (and same for 3D arrays - with 1 more for loop) This way is very slow
> when the Arrays are big and full of none-equal cells.
>
> Is there a fast straight forward way in case they are not the same -
> to get a list of the uneven cells? maybe some built-in function in the
> NumPy itself?
>
Use `close_mask = np.isclose(Arr1, Arr2, Threshold, equal_nan=True)` to return a boolean mask the same shape as the arrays which is True where the elements are close and False where they are not. You can invert it to get a boolean mask which is True where they are "far" with respect to the
threshold: `far_mask = ~close_mask`. Then you can use `i_idx, j_idx = np.nonzero(far_mask)` to get arrays of the `i` and `j` indices where the values are far. For example:
for i, j in zip(i_idx, j_idx):
print("{0}, {1}, {2}, {3}, {4}, Fail".format(i, j, Arr1[i, j], Arr2[i, j], Threshold))
--
Robert Kern
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20170517/3d57f695/attachment.html>
------------------------------
Subject: Digest Footer
_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion at python.org<mailto:NumPy-Discussion at python.org>
https://mail.python.org/mailman/listinfo/numpy-discussion
------------------------------
End of NumPy-Discussion Digest, Vol 128, Issue 18
*************************************************
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20170518/d9808b4c/attachment-0001.html>
More information about the NumPy-Discussion
mailing list