[Numpy-discussion] Multiple Boolean Operations
Andrea Gavana
andrea.gavana at gmail.com
Sat May 24 13:48:50 EDT 2008
Hi All,
On Fri, May 23, 2008 at 4:50 PM, Andrea Gavana wrote:
> Hi Peter & All,
>
> On Fri, May 23, 2008 at 4:16 PM, Peter Creasey wrote:
>> Hi Andrea,
>>
>> 2008/5/23 "Andrea Gavana" <andrea.gavana at gmail.com>:
>>> And so on. The probelm with this approach is that I lose the original
>>> indices for which I want all the inequality tests to succeed:
>>
>> To have the original indices you just need to re-index your indices, as it were
>>
>> idx = flatnonzero(xCent >= xMin)
>> idx = idx[flatnonzero(xCent[idx] <= xMax)]
>> idx = idx[flatnonzero(yCent[idx] >= yMin)]
>> idx = idx[flatnonzero(yCent[idx] <= yMax)]
>> ...
>> (I haven't tested this code, apologies for bugs)
>>
>> However, there is a performance penalty for doing all this re-indexing
>> (I once fell afoul of this), and if these conditions "mostly" evaluate
>> to True you can often be better off with one of the solutions already
>> suggested.
>
> Thank you for your answer. I have tried your suggestion, and the
> performances are more or less comparable with the other NumPy
> implementations (yours is roughly 1.2 times slower than the others),
> but I do gain some advantage when the subgrids are very small (i.e.,
> most of the values in the first array are already False). I'll go and
> implement your solution when I have many small subgrids in my model.
I have added a few more implementations for this issue. One think that
stroke me was that I could actually use sorted arrays for my xCent,
yCent and zCent vectors, so I came up with a solution which uses
numpy.searchsorted but the speed is more or less as it was without
sorting. The specific function is this:
def MultipleBoolean13():
""" Numpy solution 5 (Andrea). """
searchsorted = numpy.searchsorted
indxStart, indxEnd = searchsorted(xCentS, [xMin, xMax])
indyStart, indyEnd = searchsorted(yCentS, [yMin, yMax])
indzStart, indzEnd = searchsorted(zCentS, [zMin, zMax])
xInd = numpy.zeros((nCells), dtype=numpy.bool)
yInd = xInd.copy()
zInd = xInd.copy()
xInd[xIndices[indxStart:indxEnd]] = True
yInd[yIndices[indyStart:indyEnd]] = True
zInd[zIndices[indzStart:indzEnd]] = True
xyzReq = numpy.nonzero(xInd & yInd & zInd)[0]
Where xCentS, yCentS and zCentS are the sorted arrays. I don't see any
easy way to optimize this method, so I'd like to know if it is
possible to code it better than I did. I have done some testing and
timings of all the solutions we came up until now (12
implementations), and this is what I get (please notice the nice work
or ascii art :-D :-D ):
*******************
* SUMMARY RESULTS *
*******************
---------------------------------------------------------------------
Number Of Cells: 50000
---------------------------------------------------------------------
| Rank | Method Name | Execution Time | Relative Slowness |
---------------------------------------------------------------------
1 NumPy 5 (Andrea) 0.00562803 1.00000
2 NumPy 1 (Francesc) 0.00563365 1.00100
3 NumPy 2 (Nathan) 0.00577322 1.02580
4 NumPy 4 (Nathan-Vector) 0.00580577 1.03158
5 Fortran 6 (James) 0.00660514 1.17361
6 Fortran 3 (Alex) 0.00709856 1.26129
7 Fortran 5 (James) 0.00748726 1.33035
8 Fortran 2 (Mine) 0.00748816 1.33051
9 Fortran 1 (Mine) 0.00775906 1.37864
10 Fortran 4 {Michael) 0.00777685 1.38181
11 NumPy 3 (Peter) 0.01253662 2.22753
12 Cython (Stefan) 0.01597804 2.83901
---------------------------------------------------------------------
---------------------------------------------------------------------
Number Of Cells: 100000
---------------------------------------------------------------------
| Rank | Method Name | Execution Time | Relative Slowness |
---------------------------------------------------------------------
1 NumPy 5 (Andrea) 0.01080372 1.00000
2 NumPy 2 (Nathan) 0.01109147 1.02663
3 NumPy 1 (Francesc) 0.01114189 1.03130
4 NumPy 4 (Nathan-Vector) 0.01214118 1.12380
5 Fortran 6 (James) 0.01351264 1.25074
6 Fortran 5 (James) 0.01368450 1.26665
7 Fortran 3 (Alex) 0.01373010 1.27087
8 Fortran 2 (Mine) 0.01415306 1.31002
9 Fortran 1 (Mine) 0.01425558 1.31951
10 Fortran 4 {Michael) 0.01443192 1.33583
11 NumPy 3 (Peter) 0.02463268 2.28002
12 Cython (Stefan) 0.04298108 3.97836
---------------------------------------------------------------------
---------------------------------------------------------------------
Number Of Cells: 150000
---------------------------------------------------------------------
| Rank | Method Name | Execution Time | Relative Slowness |
---------------------------------------------------------------------
1 NumPy 1 (Francesc) 0.01613255 1.00000
2 NumPy 5 (Andrea) 0.01619734 1.00402
3 NumPy 2 (Nathan) 0.01647855 1.02145
4 NumPy 4 (Nathan-Vector) 0.01779452 1.10302
5 Fortran 3 (Alex) 0.02064676 1.27982
6 Fortran 2 (Mine) 0.02382278 1.47669
7 Fortran 4 {Michael) 0.02404563 1.49050
8 Fortran 5 (James) 0.02734487 1.69501
9 Fortran 6 (James) 0.02762538 1.71240
10 Fortran 1 (Mine) 0.03028402 1.87720
11 NumPy 3 (Peter) 0.03625735 2.24746
12 Cython (Stefan) 0.07515276 4.65845
---------------------------------------------------------------------
---------------------------------------------------------------------
Number Of Cells: 200000
---------------------------------------------------------------------
| Rank | Method Name | Execution Time | Relative Slowness |
---------------------------------------------------------------------
1 NumPy 5 (Andrea) 0.02187359 1.00000
2 NumPy 1 (Francesc) 0.02309221 1.05571
3 NumPy 2 (Nathan) 0.02323452 1.06222
4 NumPy 4 (Nathan-Vector) 0.02378610 1.08743
5 Fortran 3 (Alex) 0.02792134 1.27649
6 Fortran 2 (Mine) 0.03119301 1.42606
7 Fortran 4 {Michael) 0.03221007 1.47256
8 Fortran 5 (James) 0.03584257 1.63862
9 Fortran 6 (James) 0.03627464 1.65838
10 Fortran 1 (Mine) 0.04048422 1.85083
11 NumPy 3 (Peter) 0.04765184 2.17851
12 Cython (Stefan) 0.09927396 4.53853
---------------------------------------------------------------------
---------------------------------------------------------------------
Number Of Cells: 250000
---------------------------------------------------------------------
| Rank | Method Name | Execution Time | Relative Slowness |
---------------------------------------------------------------------
1 NumPy 5 (Andrea) 0.02651608 1.00000
2 NumPy 1 (Francesc) 0.02864898 1.08044
3 NumPy 2 (Nathan) 0.02933160 1.10618
4 NumPy 4 (Nathan-Vector) 0.02960466 1.11648
5 Fortran 3 (Alex) 0.03427299 1.29254
6 Fortran 2 (Mine) 0.03848291 1.45130
7 Fortran 4 {Michael) 0.03919467 1.47815
8 Fortran 6 (James) 0.04430585 1.67091
9 Fortran 5 (James) 0.04765620 1.79726
10 Fortran 1 (Mine) 0.04914228 1.85330
11 NumPy 3 (Peter) 0.05981760 2.25590
12 Cython (Stefan) 0.12797177 4.82619
---------------------------------------------------------------------
---------------------------------------------------------------------
Number Of Cells: 300000
---------------------------------------------------------------------
| Rank | Method Name | Execution Time | Relative Slowness |
---------------------------------------------------------------------
1 NumPy 5 (Andrea) 0.03365729 1.00000
2 NumPy 1 (Francesc) 0.03470315 1.03107
3 NumPy 2 (Nathan) 0.03519601 1.04572
4 NumPy 4 (Nathan-Vector) 0.03529574 1.04868
5 Fortran 3 (Alex) 0.04167365 1.23818
6 Fortran 2 (Mine) 0.04653522 1.38262
7 Fortran 4 {Michael) 0.04711222 1.39976
8 Fortran 6 (James) 0.05287650 1.57103
9 Fortran 5 (James) 0.05686667 1.68958
10 Fortran 1 (Mine) 0.05913682 1.75703
11 NumPy 3 (Peter) 0.07183072 2.13418
12 Cython (Stefan) 0.15876811 4.71720
---------------------------------------------------------------------
******************************
* ELAPSED TIME: 64.828 Seconds *
******************************
Just in case someone is interested in it, I attach the source code of
the implementations I have so far. Thank you vey much for your
suggestions!
Andrea.
"Imagination Is The Only Weapon In The War Against Reality."
http://xoomer.alice.it/infinity77/
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: MultipleBoolean.py
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20080524/b82c8aea/attachment.ksh>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: MultipleBooleanCython.pyx
Type: application/octet-stream
Size: 1157 bytes
Desc: not available
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20080524/b82c8aea/attachment.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: numpy.pxi
Type: application/octet-stream
Size: 3554 bytes
Desc: not available
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20080524/b82c8aea/attachment-0001.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: python.pxi
Type: application/octet-stream
Size: 647 bytes
Desc: not available
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20080524/b82c8aea/attachment-0002.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: MultipleBooleanFortran.f90
Type: application/octet-stream
Size: 3621 bytes
Desc: not available
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20080524/b82c8aea/attachment-0003.obj>
More information about the NumPy-Discussion
mailing list