[Numpy-discussion] Is there a pure numpy recipe for this?

Wed Mar 26 16:12:20 EDT 2014

On Wed, Mar 26, 2014 at 3:48 PM, Slaunger <Slaunger at gmail.com> wrote:
> I am working on solving a recent recreational mathematical problem on
> Project Euler <http://projecteuler.net>  . I have a solution, which works
> fine for small N up to 10^5 but it takes too long to compute for the actual
> problem, where N is of the order 2*10^7. The problem is nested loops, and I
> am hoping to avoid one level in the loop by using clever numpy magic (as I
> have done often before). However, there is one step, which I cannot figure
> out how to do using numpy operations alone, and I am hoping for some help
>
> The subproblem is that I have in principle k = 1, ..., N sets of boolean
> arrays
> f_k and g_k each of length N.
>
> For each k I need to find the number of elements i where both f_k[i] and
> g_k[i] are True and sum that up over all N values of k.

IIUC,

[~/]
[1]: np.logical_and([True, False, True], [False, False, True])
[1]: array([False, False,  True], dtype=bool)

You can avoid looping over k since they're all the same length

[~/]
[3]: np.logical_and([[True, False],[False, True],[False, True]],
[[False, False], [False, True], [True, True]])
[3]:
array([[False, False],
       [False,  True],
       [False,  True]], dtype=bool)

[~/]
[4]: np.sum(np.logical_and([[True, False],[False, True],[False,
True]], [[False, False], [False, True], [True, True]]), axis=0)
[4]: array([0, 2])


>
> A problem of the order 4*10^14 if I just do it brute force. This takes way
> too long (there is a one minute rule).
>
> However, after a lot of thinking and by using some properties of the f_k and
> g_k I have managed to construct using only pure numpy function and only a
> single loop over k, arrays
>
> f_k_changes_at
> g_k_changes_at
>
> which contain the indices i at which the functions change it boolean value
> from True to False or False to True.
>
> It so happens that the number of changes is only a small fraction of N, the
> fraction decreases with larger N, so the size of these changes_at arrays
> contains perhaps only 1000 elements instead of 10000000 for each k, a
> significant reduction of complexity.
>
> Now, my problem is to figure out for how many values of i both f_k and g_k
> are True given the changes_at arrays.
>
> As this may be a little hard to understand here is a specific example of how
> these arrays can look like for k = 2 and N = 150
>
> f_2_changes_at = [  2   3  39  41  58  59  65  66  93 102 145]
> g_2_changes_at = [  2  94 101 146 149]
>
> with the boundary condition that f_2[0] = g_2[0] = False
>
> Which expands to
> i    f_2    g_2   f_2 and g_2
> 0     F       F                  F
> 1     F       F                  F <-
> 2     T      T                  T <-
> 3     F      T                   F
> 4     F      T                   F
> ...
> 38   F      T                   F <-
> 39   T      T                   T
> 40   T      T                   T <-
> 41   F      T                   F
> 42   F      T                   F
> ...
> 57   F      T                   F <-
> 58   T      T                   T <-
> 59   F      T                   F
> 60   F      T                   F
> ...
> 64   F      T                   F <-
> 65   T      T                   T <-
> 66   F      T                   F
> ...
> 92   F      T                   F <-
> 93   T      T                   T <-
> 94   T      F                   F
> ...
> 100  T     F                   F <-
> 101  T     T                   T <-
> 102  F     T                   F
> ...
> 144  F     T                   F <-
> 145  T     T                   T <-
> 146  T     F                   F
> 147  T     F                   F
> 148  T     F                   F <-
> 149  T     T                   T <-
>
> With the sum of elements fulfilling the condition being (see arrows)
>
> (2 - 1) + (40 - 38) + (58 - 57) + (65 - 64) + (93 - 92) + (101 - 100) + (145
> - 144) + (149 - 148) =
> 1 + 2 + 1 + 1 + 1 + 1 + 1 + 1 = 9
>
> So, is there a numpy recipe for doing the equivalent process without
> expanding it into the full arrays?
>
> I have tried looping over each element in the changes_at arrays and build up
> the sums, but that is too inefficient as I then have an inner for loop
> containing conditional branching code
>
> Thanks in advance, Slaunger
>
>
>
> --
> View this message in context: http://numpy-discussion.10968.n7.nabble.com/Is-there-a-pure-numpy-recipe-for-this-tp37077.html
> Sent from the Numpy-discussion mailing list archive at Nabble.com.
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion