[Numpy-discussion] Need help for implementing a fast clip in numpy (was slow clip)

Thu Jan 11 00:41:55 EST 2007

Christopher Barker wrote:
>
>
> A. M. Archibald wrote:
>
>> Why not write the algorithm in C? 
>
> I did just that a while back, for Numeric. I've enclosed the code for 
> reference.
>
> Unfortunately, I never did figure out an efficient way to write this 
> sort of thing for all types, so it only does doubles. Also, it does a 
> bunch of special casing for discontiguous vs. contiguous arrays, and 
> clipping to an array vs a scaler for the min and max arguments.
To do the actual clipping if the datatypes are 'native' is trivial in C: 
a single loop, a comparison, that's it. I have become an irrational C++ 
hater with the time, so I don't use template (and I don't think C++ is 
welcomed in core numpy). For those easy case of template use, autogen 
works well enough for me; my impression is that numpy uses a similar 
system, eg for ufunc, etc... You can look at 
scipy/Lib/sandbox/cdavid/src/levinson1d.tpl for an example of autogen to 
generate function for any datatype you want; if you need more fancy 
template facilities like partial specialization and other crazy things 
mere mortals like me will never understand in C++, then I am not sure 
autogen can be used.

I guess the method used in numpy is better to use for core 
functionalities, as it avoids the burden of installing one more tool for 
development.

Now, I didn't know that clip was supposed to handle arrays as min/max 
values. At first, I didn't understand the need to care about 
contiguous/non contiguous; having non scalar for min/max makes it 
necessary to have special case for non contiguous. But again, it is 
important not to lose sight... The goal was to have faster clipping for 
matplotlib, and this cases are easy, because it is native type and 
scalar min/max, where contiguous or not does not matter as we traverse 
the input arrays element by element. If we pass non native endian, non 
contiguous arrays, there is actually a pretty good chance that the 
current implementation is already fast enough, and does not need to be 
changed anyway.

Thanks for the suggestion and the precisions,

David