[SciPy-User] optimize a piece of code pruning arrays based on distance

Nicolas Cellier contact at nicolas-cellier.net
Wed Jan 24 06:05:21 EST 2018


You may be able to vectorise that piece of code with the np.where
<https://docs.scipy.org/doc/numpy-1.13.0/reference/generated/numpy.where.html>
function.

Otherway, your code may be optimized with the just-in-time compiler numba
<https://numba.pydata.org/>.

Good luck !

2018-01-24 11:44 GMT+01:00 William Heymann <immudzen at gmail.com>:

> Hello,
>
> [Sorry if this is a duplicate. I realized I sent my previous email from
> the wrong address]
>
> I am trying to find a way to use scipy/numpy to optimize a piece of code.
> From profiling I already know over 90% of the runtime of the function is in
> 4 lines of code. This code is from selSPEA2 in DEAP in case that matters.
>
> for i in range(N):
>     for j in range(1, size - 1):
>             if sorted_indices[i,j] == min_pos:
>                 sorted_indices[i,j:size - 1] = sorted_indices[i,j + 1:size]
>                 sorted_indices[i,j:size] = min_pos
>                 break
>
> The inner loop is inherently parallel since it is each iteration does not
> depend on any other. With C++ I would use OpenMP or TBB to parallelize the
> outer loop.
>
> For a more complete picture I have also included the surrounding code. The
> basic problem is if the number of entries (size) is greater than the number
> allowed to be kept (k) for the next generation then the most similar items
> (based on distance) need to be removed until size == k. At each iteration a
> solution with the lowest distance to another solution is removed and then
> the distance from the remove solution to all other solutions is set as
> infinity. This prevents removing all points in a cluster.
>
> Thank you
>
>
> while size > k:
>     # Search for minimal distance
>     min_pos = 0
>     for i in range(1, N):
>         for j in range(1, size):
>             dist_i_sorted_j = distances[i,sorted_indices[i,j]]
>             dist_min_sorted_j = distances[min_pos,sorted_indic
> es[min_pos,j]]
>
>             if dist_i_sorted_j < dist_min_sorted_j:
>                 min_pos = i
>                 break
>             elif dist_i_sorted_j > dist_min_sorted_j:
>                 break
>
>     distances[:,min_pos] = float("inf")
>     distances[min_pos,:] = float("inf")
>
>     #This part is still expensive but I don't know a better way to do it
> yet.
>     #Essentially all remaining time in this function is in this section
>     #It may even make sense to do this in C++ later since it is trivially
> parallel
>     for i in range(N):
>         for j in range(1, size - 1):
>             if sorted_indices[i,j] == min_pos:
>                 sorted_indices[i,j:size - 1] = sorted_indices[i,j + 1:size]
>                 sorted_indices[i,j:size] = min_pos
>                 break
>
>     # Remove corresponding individual from chosen_indices
>     to_remove.append(min_pos)
>     size -= 1
>
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at python.org
> https://mail.python.org/mailman/listinfo/scipy-user
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scipy-user/attachments/20180124/79477bca/attachment.html>


More information about the SciPy-User mailing list