[Numpy-discussion] Toward release 1.0 of NumPy

Wed Apr 12 21:59:01 EDT 2006

Travis Oliphant wrote:

> Tim Hochberg wrote:
>
>> Travis Oliphant wrote:
>>
>>> Travis Oliphant wrote:
>>>
>>>>
>>>> The next release of NumPy will be 0.9.8
>>>>
>>>> Before this release is made,  I want to make sure the following 
>>>> tickets are implemented
>>>>
>>>> http://projects.scipy.org/scipy/numpy/ticket/54
>>>> http://projects.scipy.org/scipy/numpy/ticket/55
>>>> http://projects.scipy.org/scipy/numpy/ticket/56
>>>
>>>
>>>
>>>
>>>
>>> So you don't have to read each one individually:
>>>
>>>
>>> #54 :  implement thread-based error-handling modes
>>> #55 :  finish scalar-math implementation which recognizes same 
>>> error-handling
>>> #56 :  implement rich_comparisons on string arrays and unicode arrays.
>>
>>
>>
>> I'll help with #54 at least, since I was the complainer, er I mean, 
>> since I brought that one up. It's probably better to get that started 
>> before #55 anyway. The open issues that I see connected to this are:
>
>
> Great.  I agree that #54 needs to be done before #55 (error handling 
> is what's been holding up #55 the whole time.
>
>>
>>    1. Better support for catching integer divide by zero. That 
>> doesn't work at all here,
>
>
> Probably a platform/compiler issue.   The numarray equivalent code had 
> an if statement to prevent the compiler from optimizing it away.  
> Perhaps we need to do something like that.   Also, perhaps VC7 has 
> some means to set the divide by zero error more directly and we can 
> just use that.
>
>> I'm guessing because my optimizer is too smart. I spent a half hour 
>> this morning trying how to set the divide by zero flag directly using 
>> VC7, but I couldn't find anything. I suppose I could see if there's 
>> some pragma to turn off optimization around that one function. 
>> However, I'm interested in what you think of stuffing the integer 
>> divide by zero information directly into a flag on the thread local 
>> object and then checking it on the way out. 
>
>
>
> Hmm..   The only issue is that dictionary look-ups are more expensive 
> then register look-ups.    This could be costly.
>
>
>> This is cleaner in that it doesn't rely on platform specific flag 
>> setting ifdeffery and it allows us to consider issue #2.
>>
>>    2. Breaking integer divide by zero out from floating point divide 
>> by zero. The former is more serious in that it's silent. The latter 
>> returns INF, so you can see that something happened by examing your 
>> results, while the former returns zero. That has much more potential 
>> for confusion and silents bugs. Thus, it seems reasonable to be able 
>> to set the error handling different for integer divide by zero and 
>> floating point divide by zero. Note that this would allow integer 
>> divide by zero to be set to 'raise' and still run all the FP ops at 
>> max speed, since the flag saying do no error checking could ignore 
>> the int_divide_by_zero setting.
>
>
>
> Interesting proposal.    Yes, it is true that integer division 
> returning zero is less well-justified.   But, I'm still concerned with 
> doing a dictionary lookup for every divide-by-zero, and (more 
> importantly) to check to see if a divide-by-zero has occurred.   The 
> dictionary lookups is the largest source of small-array slow-down when 
> comparing Numeric to NumPy.

Well, assuming that we can fix the error flag setting code here, we 
could still break the divide by zero error handling out by doing some 
special casing in the ufunc machinery since the ufuncs presumably can 
figure out there own types. Still, the thread local storage option is 
cleaner if we can figure out a way to make the dictionary lookups fast 
enough. The lookup in the failing case is not a big deal I don't think. 
First, it's normally an error so I don't mind introduce some slowing. 
Second ,it should be easy to only do the lookup once. Just have a flag 
that enusres that after the first lookup, the divided by zero flag is 
not set a second time. I guess the bigger issue is the lookup on the way 
out to see if anything failed. I have a plane, which I'll present at the 
bottom.

>>
>>   3. Tossing out the overflow checking on integer operations. It's 
>> incomplete anyway and it slows things down. I don't really expect my 
>> integer operations to be overflow checked, and personally I think 
>> that incomplete checking is worse than no checking. I think we should 
>> at least disable the support for the time being and possibly revisit 
>> this latter when we have time to do a complete job and if it seems 
>> necessary.
>
>
> I'm all for that.   I think it makes the code slower and because it is 
> incomplete (addition and subtraction don't do it), it makes for 
> harder-to-explain code.
>
> On the scalar operations, we should check for over-flow, however...

OK.

>
>>
>>   4. Different defaults I'd like to enable different defaults without 
>> slowing things down in the really super fast case.
>
>
>
> The discussion on different defaults is fine.   The slow-down is that 
> with the current defaults, the error register flags are not actually 
> checked if the default has not been changed.    With the 
> numarray-defaults, the register flags would be checked at the end of 
> each 1-d loop.   I'm not sure what kind of slow-down that would 
> bring.   Certainly for 1-d cases, there would be little difference.
>
> One could actually simply store different defaults (but it would 
> result in minor slow-downs because the register flags would be checked.
>
OK, here's my plan. It sounds like it will work, but this threading 
business is always tricky so find holes in it if you can.

1. As we've discussed we grow some thread local storage. This storage 
has flags check_divide, check_over, check_under, check_invalid and 
check_int_divide. It also has a flag int_divide_err. These flags are 
initialized to False, but then may immediately be set to a different 
default value. This is to simplify #3.

2. We grow 6 static longs that correspond to the above and are 
initialized to zero. They should be called check_divide_count, etc. or 
something similar.

3. Whenever a flag is switched from False to True it's corresponding 
global is incremented. Similarly, when switched from True to False the 
global is decremented.

4. When a divide by integer zero occurs, we check the int_divide_err 
flag. If it is false, we set it to true and also increment 
int_divide_err_count. We also set a local flag so that we don't do this 
again in that call to the ufunc core function. We can actually skip this 
whole step if check_int_divide_count is zero.

With all that in place, I think we should be able to do things 
efficiently. The ufunc can check whether any of the XXX_check_counts are 
nonzero and turn on register flag checking as appropriate. If an error 
occurs, it still only has to go to the per thread dictionary if the 
count for that particular error type is nonzero. Similarly, if the count 
int_divide_err_count is nonzero, the ufunc will have to go to the 
dictionary. If the error was set in this thread, then appropriate action 
(including possibly nothing) is taken and int_divide_err_count is 
decremented.

That all sounds more complicated than it really is, at least in my head 
;) Anyway, try to find the holes in it. It should be able to run at full 
speed if you turn off error checking in all threads. It should run at 
almost full speed as long as there aren't any errors that are being 
checked in *any thread*. I think in practice this means that most of the 
speed hit that is seen in numarray won't be here. It doesn't actually 
matter what the defaults are; turning off all error checking will still 
be fast.

Regards,

-tim

>
>
>
>