math.nroot [was Re: A brief question.]

Fri Jul 15 05:10:15 EDT 2005

Tim Peters <tim.peters at gmail.com> writes:

> [Tim]
> >> Ah, but as I've said before, virtually all C compilers on 754 boxes
> >> support _some_ way to get at this stuff.  This includes gcc before C99
> >> and fenv.h -- if the platforms represented in fpectlmodule.c were
> >> happy to use gcc, they all could have used the older gcc spellings
> >> (which are in fpectlmodule.c, BTW, under the __GLIBC__ #ifdef).
> 
> [Michael] 
> > Um, well, no, not really.  The stuff under __GLIBC___ unsurprisingly
> > applies to platforms using the GNU project's implementation of the C
> > library, and GCC is used on many more platforms than just that
> > (e.g. OS X, FreeBSD).
> 
> Good point taken:  parings of C compilers and C runtime libraries are
> somewhat fluid.
> 
> So if all the platforms represented in fpectlmodule.c were happy to
> use glibc, they all could have used the older glibc spellings. 
> Apparently the people who cared enough on those platforms to
> contribute code to fpectlmodule.c did not want to use glibc, though. 

It may not have been possible for them, after a little googling.  It
seems that while glibc is theoretically portable to systems other than
linux, in practice it ain't.

> In the end, I still don't know why there would be a reason to hope
> that an endless variety of other libms would standardize on the C99
> spellings.

Point.  But as you said in the post I replied to, soon (maybe even
now) there won't be an "endless variety of other libms" to worry
about.

> > ...
> >  Even given that, the glibc section looks mighty Intel specific to me (I don't
> > see why 0x1372 should have any x-architecture meaning).
> 
> Why not?  I don't know whether glibc ever did this, but Microsoft's
> spelling of this stuff used to, on Alphas (when MS compilers still
> supported Alphas), pick apart the bits and rearrange them into the
> bits needed for the Alpha's FPU control registers.  

Well, I considered that but decided that it was far too insane.  Maybe
that was insufficient cynicism :)

In any case, glibc's docs today only mention the C99 (and C99-like,
for setting traps) interfaces, and I can't be arsed to go through old
docs to see if _FPU_SETCW or __setfpucw were ever documented and if so
what they were documented to do.

> > One thing GCC doesn't yet support, it turns out, is the "#pragma STDC
> > FENV_ACCESS ON" gumpf, which means the optimiser is all too willing to
> > reorder
> > 
> >    feclearexcept(FE_ALL_EXCEPT);
> >    r = x * y;
> >    fe = fetestexcept(FE_ALL_EXCEPT);
> >
> > into
> > 
> >    feclearexcept(FE_ALL_EXCEPT);
> >    fe = fetestexcept(FE_ALL_EXCEPT);
> >    r = x * y;
> > 
> > Argh!  Declaring r 'volatile' made it work.
>  
> Oh, sigh.  One of the lovely ironies in all this is that CPython
> _could_ make for an excellent 754 environment, precisely because it
> does such WYSIWYG code generation. 

One of my motivations here (other than the instantly discountably one
of aesthetic purity :) is to make Python a good system for prototyping
numeric codes.

> Optimizing-compiler writers hate hidden side effects, and every fp
> operation in 754 is swimming in them -- but Python couldn't care
> much less.

Thing is, I don't see how this particular stuff should be that hard to
implement -- if you have an optimizing compiler that can move code
around, presumably you have some way of saying what can't be moved
past what else.  But you're not going to catch me diving into GCC's
sources :) (In other news, passing -frounding-math to GCC may also
have the desired effect, but I haven't tested this).

> Anyway, you're rediscovering the primary reason you have to pass a
> double lvalue to the PyFPE_END_PROTECT protect macro. 
> PyFPE_END_PROTECT(v) expands to an expression including the
> subexpression
> 
>     PyFPE_dummy(&(v))
> 
> where PyFPE_dummy() is an extern that ignores its double* argument. 
> The point is that this dance prevents C optimizers from moving the
> code that computes v below the code generated for
> PyFPE_END_PROTECT(v).  Since v is usually used soon after in the
> routine, it also discourages the optimizer from moving code up above
> the PyFPE_END_PROTECT(v) (unless the C does cross-file analysis, it
> has to assume that PyFPE_dummy(&(v)) may change the value of v). 
> These tricks may be useful here too -- fighting C compilers to the
> death is part of this game, alas.

I did read those comments.  Maybe passing the address of the result
variable to the routine that checks the flags and decides whether to
raise an exception would be a good hack (and, yes, writing that
function in another file so GCC doesn't bloody well inline it and then
rearrange all my code).

> PyFPE_END_PROTECT() incorporates an even stranger trick, and I wonder
> how gcc deals with it.  The Pentium architecture made an agonizing
> (for users who care) choice:  if you have a particular FP trap enabled
> (let's say overflow), and you do an fp operation that overflows, the
> trap doesn't actually fire until the _next_ fp operation (of any kind)
> occurs.  You can honest-to-God have, e.g., an overflowing fp add on an
> Intel box, and not learn about it until a billion cycles after it
> happened (if you don't do more FP operations over the next billion
> cycles).

I read those comments too.

One thing that I know I don't know is how the introduction of things
like SSE (and particularly SSE2 and SSE3) instructions to the pentium
architectures affect all this.  Are they only applicable to vector
stuff?

> So "the other thing" PyFPE_END_PROTECT does is force a seemingly
> pointless double->int conversion (it always coerces 1.0 to an int),
> just to make sure that a Pentium will act on any enabled trap that
> occurred before it.
> 
> If you have in mind just testing flags (and staying away from enabling
> HW traps -- and this is the course I recommend), 

For one thing, C99 doesn't, AFAICT, say *anything* about enabling HW
traps (it seems glibc defines C99-a-like functions feenableexcept/
fedisableexcept/fegetexcept to do this).

I'm not even sure how to enable HW traps in assembly on PPC (the only
architecture I actually know anything about at this level) because it
seems to involve twiddling bits in the Machine State Register and you
can't execute the mtmsr instruction in user state (I guess there's
some kernel call to do it).

> this shouldn't matter (the sticky status flag is set immediately,
> it's only triggering the correspondnig trap that's delayed).  I
> haven't studied C99 deeply enough to determine whether it has weasle
> words allowing traps to be delayed indefinitely, but that kind of
> HW-driven compromise is common in the C standards.

See above, maybe.

> Not to imply that this isn't all dead easy <wink>.

Heh.

Cheers,
mwh

-- 
  <AdamV> SamB: PHP's basic control structure is the "database 
          timeout error".                       -- from Twisted.Quotes