[SciPy-Dev] adding chkfinite flags to linalg functions

Fri Aug 26 14:45:45 EDT 2011

On 08/26/2011 12:02 PM, Christopher Jordan-Squire wrote:
> On Fri, Aug 26, 2011 at 10:38 AM, Bruce Southey<bsouthey at gmail.com>  wrote:
>> On 08/26/2011 09:40 AM, Nathaniel Smith wrote:
>>
>> Would calling it check_finite instead be too horrible?
>>
>> - Nathaniel
>>
>> On Aug 26, 2011 2:47 AM, "Ralf Gommers"<ralf.gommers at googlemail.com>  wrote:
>>> Hi,
>>>
>>> https://github.com/scipy/scipy/pull/48 adds a chkfinite flag to
>>> scipy.linalg
>>> functions that allows for disabling a check on whether there are infs/nans
>>> in the array, which can be a speed gain if you already know that there's
>>> no
>>> infs/nans in your input array.
>>>
>>> Is anyone opposed to merging this?
>>>
>>> Ralf
>> _______________________________________________
>> SciPy-Dev mailing list
>> SciPy-Dev at scipy.org
>> http://mail.scipy.org/mailman/listinfo/scipy-dev
>>
>> I do oppose this because I do not think it is the correct solution to an
>> apparent issue of computational speed. I do say apparent because there is no
>> code and no timings to support it.
>>
> Some timing data is at the pull request link. (I was comparing the
> then-current chkfinite against a modification Ralf suggested.)
>
> "I tried that (calling it chkfinite_mod) on an ndarray A of shape
> (10000,500). (The extreme size was mostly to make the differences
> clear. The issues in my application had been appearing when A was
> (3000,2).) The results of timeit are as follows.
>
> np.asarray_chkfinite(A): 153 ms
> chkfinite_mod(A): 59.3 ms
> np.asarray(A): 1.9 us
>
> So asarray_chkfinite is 75,000 times as slow, and chkfinite_mod is
> 30,000 times as slow. That's enough of a change that I'm going to make
> the small modification to numpy and make a pull request, but it still
> isn't big enough.

There is no example or timing code so this is meaningless.
Also  'asarray_chkfinite' is numpy not scipy so it is not relevant here.
Also how many different computers have you tested this on that gave this 
difference? Hint one is not enough.

> To give a feel for the sort of issue I'm thinking of, let's say that A
> and b are ndarrays of shape (100000,5) and b is (100000,). Then
> currently a call to lstsqr produces (using prun in ipython) that the
> entire solve takes 0.060s, and 0.017 seconds is spent in
> asarray_chkfinite. That's a huge amount of time wasted verifying the
> input, proportionally, so I'd really like orders of magnitude
> reduction in the time taken to pass the input on to the solver.
> Especially since I want to use these things in situations where this
> solver is being called thousands of times over and over on data that's
> already been verified as finite. "
I do understand the time issue but that is your very specific 
situation.  You have to provide the sample code that replicates this as 
are using it might be just your environment and computer.

>> The proposed argument is really not a check but to avoid something that has
>> to be done in order to obtain a valid outcome.  Hence a possible reason why
>> the original author put this check in these functions.  If we want to avoid
>> this then we need to change the workflow by breaking this into two steps
>> rather than this approach.
>>
> The check's aren't something that must be done to obtain a valid
> outcome. They're a way to ensure the input is valid. If you know for
> other reasons that your input is valid then it's a waste of resources.
Yes, I do understand that but I do not think that you are the average 
scipy user nor is your situation typical of scipy.
> Breaking it into two steps has its own issues associated with it as
> well. Then the user must verify their own inputs before passing them
> to scipy linalg methods. If I understand your suggestion correctly.
That is exactly what your patch is doing because people will misuse that 
argument just because they perceive it as faster.

> Why would that be a better approach?
I currently have no problems with the scipy approach as what you 
indicate below regarding lapack.
>> All this patch does is say if you want to go 'fast' then set
>> 'chkfinite=False' regardless of type of input and fails to address any
>> issue. We quite often see on numpy/scipy lists where a user has made
>> incorrect assumptions about their input which is will not catch or catch at
>> the end.
>>
>> I do not know if Chris provided any tickets for the issues he pointed out
>> in:
>> http://mail.scipy.org/pipermail/scipy-dev/2011-July/016386.html
>> "But if you pass a numpy array with inf's in it, then it hangs."
>>
>> This should have a ticket with an clear easy example because it highlights a
>> large problem.
>>
> It has to do with the underlying Lapack call, if IIRC. Not something
> in numpy/scipy.
So what?
File a bug report so people know it is a problem and can fix it!
If it is in Lapack then it needs to be pushed upstream. So in the 
meantime we do have a work around it. But this patch permits users to 
incorrectly remove that work around.

>> I also agree with Nathaniel that the name of the argument is bad but
>> check_finite would not solve my issue with the name.
>>
> chkfinite was used because of the underlying numpy function being
> bypassed. I don't have a strong preference, but that does make it
> easier to relate the option to the actual code.
That is poor reason for a name as only someone who knows the code will 
understand the meaning - users don't read documentation :-)

>> Yes, I am being negative because this is a community and it needs to be
>> shown that this patch will improve scipy as whole.
>>
>> Bruce
>>
>>

Bruce