[Numpy-discussion] Decision tree-like algorithm on numpy arrays

Thu May 6 02:50:33 EDT 2010

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hi all,

I have an old c-extension I want to remove from my code to the benefit of numpy,
but it looks kind of tricky to me.

Here is the thing:
I have a number of arrays of the same shape.
On these arrays, I run a sequence of tests, leading to a kind of decision tree.
In the end, based on these tests, I get a number of result arrays where, based
on the tests, each element gets a value.

The way to do this in an efficient way with numpy is quite unclear to me.
My first thought would be:

result_array1 = np.where(some_test_on(array1),
                         np.where(some_test_on(array2),
                                  1,
                                  2),
                         np.where(some_test_on(array3, array4),
                                  np.where(some_test_on(array5),
                                           3,
                                           4),
                                  4))

result_array2 = np.where(some_test_on(array1),
                         np.where(some_test_on(array2),
                                  True,
                                  True),
                         np.where(some_test_on(array3, array4),
                                  np.where(some_test_on(array5),
                                           True,
                                           False),
                                  True))

etc... but that means running the same tests several times, which is not
acceptable if the tests are lengthy.

In order to avoid this problem I could also have some mask based on each test:
mask1 = some_test_on(array1)
mask2 = some_test_on(array2[mask1])
mask3 = some_test_on(array3[!mask1], array4[!mask1])
mask4 = some_test_on(array5[!mask1][mask3])

result_array1[mask1][mask2] = 1
result_array1[mask1][!mask2] = 2
result_array1[!mask1][mask3][mask4] = 3
result_array1[!mask1][mask3][!mask4] = 4
result_array1[!mask1][!mask3] = 4

result_array2[mask1][mask2] = True
result_array2[mask1][!mask2] = True
result_array2[!mask1][mask3][mask4] = True
result_array2[!mask1][mask3][!mask4] = False
result_array2[!mask1][!mask3] = True

etc... but that looks a bit clumsy to me...

The way it was done in the C-extension was to run the decision tree on each
element sequentially, but I have the feeling that would not be very efficient
with numpy (although I know I can't beat pure C code, I would like to have
comparable times).

Does any of you wise people have an opinion on this ?

Thanks,
Martin
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (GNU/Linux)
Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org/

iQEcBAEBAgAGBQJL4ma5AAoJEBdvyODiyJI4SpgH/i0bb7PH8oTu481NRuYmbi40
VwJrOCdfSo6CauLBiIdxBZV2Hksbu2iDu5GEKJNUObf9bM7N+LK+qMwaBq1M5hF+
47yNczSEUaxshBHzUFQMlS9XEtZewhYZGepkH1oThIQbSD2IbM6fWkVj+EJRwwJ5
2Ia4p1GIdLGMZ3loaWevvCmz8kjppX7Feei0hEP28+HIiWq/qmUlccYZm/ThZcFE
6ROEKtkepKsf3vOfpuS5Hr6U1Hb4mo7u9SmUcOvlCby6q/TbVtwAZjpRQB4qKEjm
DRj9EvyWBnINgr3tKVN2Cida1El8Ki9jBjhx2GxLsy78pNKqZMI9UC/iM8cehYQ=
=hUa2
-----END PGP SIGNATURE-----
-------------- next part --------------
A non-text attachment was scrubbed...
Name: martin_raspaud.vcf
Type: text/x-vcard
Size: 260 bytes
Desc: not available
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20100506/04e2ad41/attachment.vcf>