[long and fussy] But what do these operators _mean_?

Fri Jul 21 14:04:40 EDT 2000

And Huaiyu Zhu writes:
 - >Quick summary:  Maybe there needs to be a numerics-sig before any 
 - >of these changes are adopted...
 - 
 - Yes.

So how do we get one started?  Maybe I can finally have an excuse to
dig into Python's code...  And if there is one, I hope we have some
folks who have to deal with platforms that have problematic IEEE
support (Alphas) or none (Vaxen, but not the Cray vector boxes) on it 
for reality checks.

 - Concerning the semantics of these operators, I think we only need a 
 - general agreement, leaving out algorithmic specifications to various 
 - packages.

I'm not so sure.  My experience on the periphery of the BLAST group 
has shown that general agreement has to be backed up with some
guaranteed, TESTABLE error bounds to be reliable.

 - The advantage of Python over Matlab really shows here: Only the 
 - association between .* and __dotmul__ would be built into the 
 - language.  

Moderately recent versions of Matlab have operator overloading.  
This has even been used to make a `parallel' Matlab,
  http://www.ai.mit.edu/projects/ppserver/

If anything, that's made the confusion on what things mean worse.

If you don't associate some reasonable error bounds, some well-meaning
soul will quietly provide a heavily optimized routine that works
well for 99% of the cases, but is wrong on 1%.  Like I said, tracking
the resulting bugs down in real code is a terrible ordeal, if you
even happen to find that 1% before it's deployed.

 - Whichever package you import you'll know its complete specification.

Yes, but what if you didn't import it?  What if you're providing
a routine someone else is using, and they import one with numerically
poor routines?  They then file a bug report, and you blow a _lot_ of
time determining who's problem it is.  This does happen in real life.
And too often the bug report is never resolved, leading to things
like probes crashing into Mars rather than landing gently...

 - The advantage is that this allows natural migration towards better 
 - algorithms without breaking any existing code.  (Cf the migration 
 - from regex to re.)

Ok, I think my examples were misleading.  I don't care about the exact 
implementation.  I care about the error bounds.  In the regex v. re 
case, well, they both had the same error properties: none.  (Ideally.)

 - Imho, the only possible resolution on this is to only provide pure 
 - math definition of these operators in the core language, leaving 
 - numerical specification to the packages that implement them.  

You should not rely on the pure math definition because it assumes 
exactness.

Besides what is the ``pure math definition'' of some of your operators?
Is left division defined as A^(-1) * B, or as the solution of the
linear equations in A against the right-hand sides in B?  The former
would lead me to believe it's very loose; I'd _never_ use it for
numerical work.  You'll notice that Matlab states how they are 
computed (elimination or least-squares).

What I'd shoot for is a statement of exactly what problem is solved 
along with a guaranteed error bound.

 - No one could prove that any of the numerical algorithms used today 
 - would be optimal forever, anyway.

Yes, we can.  The new eigenvalue routine from Dhillon and Parlet _is_ 
theoretically optimal; it's O(n) for n eigenvalues with error bounds 
as small as is possible.  And there's not much to shave off in the
implementation, either.  It might well be the end of the story for
the symmetric case; the damn algorithm just makes so much sense once
it's explained.

The only way to change its optimality would be to attack the 
arithmetic assumptions.  Recent work is leading people to believe 
that there aren't many sensible arithmetics possible.  IEEE754 is 
slightly over-specified, but only to help the user.

On the flip side, matrix inversion is still open.  Kinda neat that 
it's so much harder...  The theoretical minimum is O(n^2) for the 
general case, as you have to fill n^2 entries.  The best ones are 
related to the best matmuls, but I can't remember Winograd's best 
one right now.  Unfortunately, the best ones aren't numerically 
reasonable.  Even Strassen's recursive matmul isn't stable.

And I'm not looking for the best possible error bounds, just currently 
feasible ones.  Those are available, and any better algorithm must
beat them to be considered better.  Also, the error bounds given in
the LAPACK and BLAST docs have been achieved with very fast
implementations, so it's not that much of a burden.

 - Agreed.  Although this is not the topic here, it is really a big 
 - drag. How do Octave handle this on all platforms?

Not well:
> bool
> xisnan (double x)
> {
> #if defined (HAVE_ISNAN)
>   return isnan (x) != 0;
> #else
>   return false;
> #endif
> }

A better test (guess I gotta send in a patch now that I've seen this) 
would be to return !(x==x) instead of false.  That should work on any 
IEEE platform and fall through on any other platform unless the 
compiler does something silly.  Unfortunately, many compilers do.

They probably have a reason why they haven't done that; it's a fairly
standard idiom.  I won't have time to test and ask until next Wednesday.  
Probably is the over-zealous compiler issue.

 - I support the idea that Python should use IEEE whenever possible, 
 - as I don't buy Java's philosophy of using the common minimum of 
 - every platform.

THANK YOU.  ;)

However, I can see the sense in providing a common minimum environment
that's good for many, so long as you do not restrict access to better 
features.  The goal is not to turn everyone into numerical analysts,
but rather to make numerical analysts as unnecessary to daily life as
possible.  That takes guaranteed, testable error bounds.  Even though
most users won't know what they mean, test suites can keep 
implementors honest.

And, imho, the default should always be accuracy before speed.  You
can always compute a wrong answer very quickly.

 - Would a NonIEEEError be possible?  That might help users on those 
 - platforms to invoke their own error handling routines by catching 
 - this exception.

That will take some thought, and these results aren't always errors.
Sometimes they carry important information.  There probably needs to
be a full FP environment system, vaguely like C99's or Common Lisp's.

 - Unlike matlab that is not object oriented, [...]

It has been for the past few years.  The competition is not standing 
still.

 - For really precision or efficiency critical applications, I think 
 - nothing beats special subclass where the user can fine tune the 
 - definition of every methods.

True, but I feel it's important to have a standard minimum level
of expectation in terms of definitions and error bounds.  People 
who do not meet that for various reasons need to say that loudly,
preferably by using a slightly different interface.

Keep in mind that I'm not against someone using faster, less accurate
routines when they absolutely know they can.  That can be terribly
necessary and useful in some areas (signal processing comes to mind).  
I'm just for ensuring that the vast majority of people never need to 
think about it.  

Matlab's popular because people don't need to think about many of the 
lower-level details.  Unfortunately, the lower-level details frequenty 
come up an bite people.  (Ok, I don't have any numbers to back up 
`frequently', and I only hear of the cases where things fail, but 
still...)  That and the huge number of supporting packages.

Jason