[Numpy-discussion] From Python to Numpy

Matthew Harrigan harrigan.matthew at gmail.com
Mon Jan 9 20:55:03 EST 2017


I also have been stalking this email thread.  First, excellent book!

Regarding the vectorization example mentioned above, one thing to note is
that it increases the order of the algorithm relative to the pure python.
The vectorized approach uses correlate, which requires ~(len(seq) *
len(sub)) FLOPs.  In the case where the first element in sub is not equal
to the vast majority of elements in seq, the basic approach requires
~len(seq) comparisons.  Note that is the case in the SO answer.  One fairly
common thing I have seen in vectorized approaches is that the memory or
operations required scales worse than strictly required.  It may or may not
be an issue, largely depends on the specifics of how its used, but it
usually indicates a better approach exists.  That may be worth mentioning
here.

Given that, I tried to come up with an "ideal" approach.  stride_tricks can
be used to convert seq to a 2D array, and then ideally each row could be
compared to sub.  However I can't think of how to do that with numpy
function calls other than compare each element in the 2D array, requiring
O(n_sub*n_seq) operations again.  array_equal
<https://docs.scipy.org/doc/numpy/reference/generated/numpy.array_equal.html>
is an example of that.  Python list equality scales better, for instance if
x = [0]*n and y = [1]*n, x == y is very fast and the time is independent of
the value of n.

It seems a generalized ufunc "all_equal" with signature (i),(i)->() and
short circuit logic once the first non equal element is encountered would
be an important performance improvement.  In the ideal case it is
dramatically faster, and even if every element must be compared then its
still probably meaningfully faster since the boolean intermediate array
isn't created.  Even better would be to get the axis argument in place for
generalized ufuncs.  Then this problem could be vectorized in one line with
far better performance.  If others think this is a good idea I will post an
issue and attempt a solution.


On Sat, Dec 31, 2016 at 5:23 AM, Nicolas P. Rougier <
Nicolas.Rougier at inria.fr> wrote:

>
> > I’ve seen vectorisation taken to the extreme, with negative consequences
> in terms of both speed and readability, in both Python and MATLAB
> codebases, so I would suggest some discussion / wisdom about when not to
> vectorise.
>
>
> I agree and there is actually a warning in the introduction about
> readability vs speed with an example showing a clever optimization (by
> Jaime Fernández del Río) that is hardly readable for the non-experts
> (including myself).
>
>
> Nicolas
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20170109/3c0da8da/attachment.html>


More information about the NumPy-Discussion mailing list