[Numpy-discussion] Decision tree-like algorithm on numpy arrays
Martin Raspaud
martin.raspaud at smhi.se
Thu May 6 02:50:33 EDT 2010
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Hi all,
I have an old c-extension I want to remove from my code to the benefit of numpy,
but it looks kind of tricky to me.
Here is the thing:
I have a number of arrays of the same shape.
On these arrays, I run a sequence of tests, leading to a kind of decision tree.
In the end, based on these tests, I get a number of result arrays where, based
on the tests, each element gets a value.
The way to do this in an efficient way with numpy is quite unclear to me.
My first thought would be:
result_array1 = np.where(some_test_on(array1),
np.where(some_test_on(array2),
1,
2),
np.where(some_test_on(array3, array4),
np.where(some_test_on(array5),
3,
4),
4))
result_array2 = np.where(some_test_on(array1),
np.where(some_test_on(array2),
True,
True),
np.where(some_test_on(array3, array4),
np.where(some_test_on(array5),
True,
False),
True))
etc... but that means running the same tests several times, which is not
acceptable if the tests are lengthy.
In order to avoid this problem I could also have some mask based on each test:
mask1 = some_test_on(array1)
mask2 = some_test_on(array2[mask1])
mask3 = some_test_on(array3[!mask1], array4[!mask1])
mask4 = some_test_on(array5[!mask1][mask3])
result_array1[mask1][mask2] = 1
result_array1[mask1][!mask2] = 2
result_array1[!mask1][mask3][mask4] = 3
result_array1[!mask1][mask3][!mask4] = 4
result_array1[!mask1][!mask3] = 4
result_array2[mask1][mask2] = True
result_array2[mask1][!mask2] = True
result_array2[!mask1][mask3][mask4] = True
result_array2[!mask1][mask3][!mask4] = False
result_array2[!mask1][!mask3] = True
etc... but that looks a bit clumsy to me...
The way it was done in the C-extension was to run the decision tree on each
element sequentially, but I have the feeling that would not be very efficient
with numpy (although I know I can't beat pure C code, I would like to have
comparable times).
Does any of you wise people have an opinion on this ?
Thanks,
Martin
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (GNU/Linux)
Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org/
iQEcBAEBAgAGBQJL4ma5AAoJEBdvyODiyJI4SpgH/i0bb7PH8oTu481NRuYmbi40
VwJrOCdfSo6CauLBiIdxBZV2Hksbu2iDu5GEKJNUObf9bM7N+LK+qMwaBq1M5hF+
47yNczSEUaxshBHzUFQMlS9XEtZewhYZGepkH1oThIQbSD2IbM6fWkVj+EJRwwJ5
2Ia4p1GIdLGMZ3loaWevvCmz8kjppX7Feei0hEP28+HIiWq/qmUlccYZm/ThZcFE
6ROEKtkepKsf3vOfpuS5Hr6U1Hb4mo7u9SmUcOvlCby6q/TbVtwAZjpRQB4qKEjm
DRj9EvyWBnINgr3tKVN2Cida1El8Ki9jBjhx2GxLsy78pNKqZMI9UC/iM8cehYQ=
=hUa2
-----END PGP SIGNATURE-----
-------------- next part --------------
A non-text attachment was scrubbed...
Name: martin_raspaud.vcf
Type: text/x-vcard
Size: 260 bytes
Desc: not available
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20100506/04e2ad41/attachment.vcf>
More information about the NumPy-Discussion
mailing list