[Python-Dev] Numerical robustness, IEEE etc.

Fri Jun 23 00:14:09 CEST 2006

Michael Hudson <mwh at python.net> wrote:
> 
> Maybe append " for me, at least" to what I wrote then.  But really, it
> is hard: because Python runs on so many platforms, and platforms that
> no current Python developer has access to.  If you're talking about
> implementing FP in software (are you?), then I guess it gets easier.

No, I am not.  And it isn't as hard as is currently made out.

> > My intentions are to provide some numerically robust semantics,
> > preferably of the form where straightforward numeric code (i.e. code
> > that doesn't play any bit-twiddling tricks) will never invoke
> > mathematically undefined behaviour without it being flagged.  See
> > Kahan on that.
> 
> That doesn't actually explain the details of your intent very much.

Let's try again.  You say that you are a mathematician.  The
standard floating-point model is that it maps functions defined on
the reals (sometimes complex) to approximations defined on floating-
point.  The conventional interpretation was that any operation that
was not mathematically continuous in an open region including its
argument values (in the relevant domain) was an error, and that all
such errors should be flagged.  That is what I am talking about.
It's all classic behaviour - nothing unusual.

> > Not a lot.  Annex F in itself is only numerically insane.  You need to
> > know the rest of the standard, including that which is documented only
> > in SC22WG14 messages, to realise the full horror.
> 
> That's not why I was mentioning it.  I was mentioning it to give the
> idea that I'm not a numerical expert but, for example, I know what a
> denorm is.

Unfortunately, that doesn't help, because it is not where the issues
are.  What I don't know is how much you know about numerical models,
IEEE 754 in particular, and C99.  You weren't active on the SC22WG14
reflector, but there were some lurkers.

> > The problem with such things is that they related to the interfaces
> > between types, and it is those aspects where object-orientation
> > falls down so badly.  For example, consider conversion between float
> > and long - which class should control the semantics?
> 
> This comment seems not to relate to anything I said, or at least not
> obviously.

I am afraid that it did.  I pointed out that some of the options
needed to control the behaviour of the implicit conversions between
built-in classes.  Now, right back in the Simula days, those issues
were one of the reasons of the religious war between the multiple
inheritance people and those who thought it was anathema.  My claim
is that such properties need to be program-global, or else you will
have the most almighty confusion.

You can take the Axiom approach of having a superclass to which
such things are bound, but most programming languages have always
had difficulty with properties that aren't clearly associated with
a single class - ESPECIALLY when they affect primitives.

> >> This could be implemented by having a field in the threadstate of FPU  
> >> flags to check after each fp operation (or each set of fp operations,  
> >> possibly).  I don't think I have the guts to try to implement  
> >> anything sensible using HW traps (which are thread-local as well,  
> >> aren't they?).
> >
> > Gods, NO!!!
> 
> Good :-)

!!!!!  I am sorry, but that isn't an appropriate response.  The fact
is that they are unspecified - IDEALLY, things like floating-point
traps would be handled thread-locally (and thus neither change context
not affect other cores, as was done on the Ferranti Atlas and many
other machines), but things like TLB miss traps, device interrupts
and machine-check interrupts need to be CPU-global.  Unfortunately,
modern architectures use a single mechanism for all of them - which
is a serious design error.

> > Sorry, but I have implemented such things (but that was on a far
> > architecture, and besides the system is dead).  Modern CPU
> > architectures don't even DEFINE whether interrupt handling is local
> > to the core or chip, and document that only in the release notes,
> > but what is clear is that some BLACK incantations are needed in
> > either case.
> 
> Well, I only really know about the PowerPC at this level...

Do you?  Can you tell me whether interrupts stop the core or chip,
for each of the classes of interrupt, and exactly what the incantation
is for changing to the other mode?

> > Think of taking a machine check interrupt on a multi- core,
> > highly-pipelined architecture and blench.  And, if that is an
> > Itanic, gibber hysterically before taking early retirement on the
> > grounds of impending insanity.
> 
> What does a machine check interrupt have to do with anything?

Because a machine check is one of the classes of interrupt that you
POSITIVELY want the other cores stopped until you have worked out
whether it impacts just the interrupted core or the CPU as a whole.
Inter alia, the PowerPC architecture takes one when a core has just
gone AWOL - and there is NO WAY that the dead core can handle the
interrupt indicating its own demise!

> > Oh, that's the calm, moderate description.  The reality is worse.
> 
> Yes, but fortunately irrelevant...

Unfortunately, it isn't.  I wish that it were :-(

> Now, a more general reply: what are you actually trying to acheive
> with these posts?  I presume it's more than just make wild claims
> about how much more you know about numerical programming than anyone
> else...

Sigh.  What I am trying to get is floating-point support of the form
that, when a programmer makes a numerical error (see above), he gets
EITHER an exception value returned OR an exception raised.  I do, of
course, need to exclude the cases when the code is testing states
explicitly, twiddling bits and so on.

Regards,
Nick Maclaren,
University of Cambridge Computing Service,
New Museums Site, Pembroke Street, Cambridge CB2 3QH, England.
Email:  nmm1 at cam.ac.uk
Tel.:  +44 1223 334761    Fax:  +44 1223 334679