[SciPy-User] optimize a piece of code pruning arrays based on distance
Nicolas Cellier
contact at nicolas-cellier.net
Wed Jan 24 06:05:21 EST 2018
You may be able to vectorise that piece of code with the np.where
<https://docs.scipy.org/doc/numpy-1.13.0/reference/generated/numpy.where.html>
function.
Otherway, your code may be optimized with the just-in-time compiler numba
<https://numba.pydata.org/>.
Good luck !
2018-01-24 11:44 GMT+01:00 William Heymann <immudzen at gmail.com>:
> Hello,
>
> [Sorry if this is a duplicate. I realized I sent my previous email from
> the wrong address]
>
> I am trying to find a way to use scipy/numpy to optimize a piece of code.
> From profiling I already know over 90% of the runtime of the function is in
> 4 lines of code. This code is from selSPEA2 in DEAP in case that matters.
>
> for i in range(N):
> for j in range(1, size - 1):
> if sorted_indices[i,j] == min_pos:
> sorted_indices[i,j:size - 1] = sorted_indices[i,j + 1:size]
> sorted_indices[i,j:size] = min_pos
> break
>
> The inner loop is inherently parallel since it is each iteration does not
> depend on any other. With C++ I would use OpenMP or TBB to parallelize the
> outer loop.
>
> For a more complete picture I have also included the surrounding code. The
> basic problem is if the number of entries (size) is greater than the number
> allowed to be kept (k) for the next generation then the most similar items
> (based on distance) need to be removed until size == k. At each iteration a
> solution with the lowest distance to another solution is removed and then
> the distance from the remove solution to all other solutions is set as
> infinity. This prevents removing all points in a cluster.
>
> Thank you
>
>
> while size > k:
> # Search for minimal distance
> min_pos = 0
> for i in range(1, N):
> for j in range(1, size):
> dist_i_sorted_j = distances[i,sorted_indices[i,j]]
> dist_min_sorted_j = distances[min_pos,sorted_indic
> es[min_pos,j]]
>
> if dist_i_sorted_j < dist_min_sorted_j:
> min_pos = i
> break
> elif dist_i_sorted_j > dist_min_sorted_j:
> break
>
> distances[:,min_pos] = float("inf")
> distances[min_pos,:] = float("inf")
>
> #This part is still expensive but I don't know a better way to do it
> yet.
> #Essentially all remaining time in this function is in this section
> #It may even make sense to do this in C++ later since it is trivially
> parallel
> for i in range(N):
> for j in range(1, size - 1):
> if sorted_indices[i,j] == min_pos:
> sorted_indices[i,j:size - 1] = sorted_indices[i,j + 1:size]
> sorted_indices[i,j:size] = min_pos
> break
>
> # Remove corresponding individual from chosen_indices
> to_remove.append(min_pos)
> size -= 1
>
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at python.org
> https://mail.python.org/mailman/listinfo/scipy-user
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scipy-user/attachments/20180124/79477bca/attachment.html>
More information about the SciPy-User
mailing list