math.nroot [was Re: A brief question.]

Fri Jul 15 00:06:17 EDT 2005

[Michael Hudson]
>>>>> In what way does C99's fenv.h fail?  Is it just insufficiently
>>>>> available, or is there some conceptual lack?

[Tim Peters]
>>>> Just that it's not universally supported.  Look at fpectlmodule.c for
>>>> a sample of the wildly different ways it _is_ spelled across some
>>>> platforms.

[Michael]
>>> C'mon, fpectlmodule.c is _old_.  Maybe I'm stupidly optimistic, but
>>> perhaps in the last near-decade things have got a little better here.

[Tim]
>> Ah, but as I've said before, virtually all C compilers on 754 boxes
>> support _some_ way to get at this stuff.  This includes gcc before C99
>> and fenv.h -- if the platforms represented in fpectlmodule.c were
>> happy to use gcc, they all could have used the older gcc spellings
>> (which are in fpectlmodule.c, BTW, under the __GLIBC__ #ifdef).

[Michael] 
> Um, well, no, not really.  The stuff under __GLIBC___ unsurprisingly
> applies to platforms using the GNU project's implementation of the C
> library, and GCC is used on many more platforms than just that
> (e.g. OS X, FreeBSD).

Good point taken:  parings of C compilers and C runtime libraries are
somewhat fluid.

So if all the platforms represented in fpectlmodule.c were happy to
use glibc, they all could have used the older glibc spellings. 
Apparently the people who cared enough on those platforms to
contribute code to fpectlmodule.c did not want to use glibc, though. 
In the end, I still don't know why there would be a reason to hope
that an endless variety of other libms would standardize on the C99
spellings.  For backward compatibility, they have to continue
supporting their old spellings too, and then what's in it for them to
supply aliases?  Say I'm SGI, struggling as often as not just to stay
in business.  I'm unlikely to spend what little cash I have to make it
easier for customers to jump ship <wink>.

> ...
>  Even given that, the glibc section looks mighty Intel specific to me (I don't
> see why 0x1372 should have any x-architecture meaning).

Why not?  I don't know whether glibc ever did this, but Microsoft's
spelling of this stuff used to, on Alphas (when MS compilers still
supported Alphas), pick apart the bits and rearrange them into the
bits needed for the Alpha's FPU control registers.  Saying that bit
0x10 (whatever) is "the overflow flag" (whatever) is as much a
x-platform API as saying that the expansion of the macro FE_OVERFLOW
is "the overflow flag".  Fancy pants symbolic names are favored by
"computer science" types these days, but real numeric programmers have
always been delighted to wallow in raw bits <wink>.

...

> One thing GCC doesn't yet support, it turns out, is the "#pragma STDC
> FENV_ACCESS ON" gumpf, which means the optimiser is all too willing to
> reorder
> 
>    feclearexcept(FE_ALL_EXCEPT);
>    r = x * y;
>    fe = fetestexcept(FE_ALL_EXCEPT);
>
> into
> 
>    feclearexcept(FE_ALL_EXCEPT);
>    fe = fetestexcept(FE_ALL_EXCEPT);
>    r = x * y;
> 
> Argh!  Declaring r 'volatile' made it work.

Oh, sigh.  One of the lovely ironies in all this is that CPython
_could_ make for an excellent 754 environment, precisely because it
does such WYSIWYG code generation.  Optimizing-compiler writers hate
hidden side effects, and every fp operation in 754 is swimming in them
-- but Python couldn't care much less.

Anyway, you're rediscovering the primary reason you have to pass a
double lvalue to the PyFPE_END_PROTECT protect macro. 
PyFPE_END_PROTECT(v) expands to an expression including the
subexpression

    PyFPE_dummy(&(v))

where PyFPE_dummy() is an extern that ignores its double* argument. 
The point is that this dance prevents C optimizers from moving the
code that computes v below the code generated for
PyFPE_END_PROTECT(v).  Since v is usually used soon after in the
routine, it also discourages the optimizer from moving code up above
the PyFPE_END_PROTECT(v) (unless the C does cross-file analysis, it
has to assume that PyFPE_dummy(&(v)) may change the value of v). 
These tricks may be useful here too -- fighting C compilers to the
death is part of this game, alas.

PyFPE_END_PROTECT() incorporates an even stranger trick, and I wonder
how gcc deals with it.  The Pentium architecture made an agonizing
(for users who care) choice:  if you have a particular FP trap enabled
(let's say overflow), and you do an fp operation that overflows, the
trap doesn't actually fire until the _next_ fp operation (of any kind)
occurs.  You can honest-to-God have, e.g., an overflowing fp add on an
Intel box, and not learn about it until a billion cycles after it
happened (if you don't do more FP operations over the next billion
cycles).

So "the other thing" PyFPE_END_PROTECT does is force a seemingly
pointless double->int conversion (it always coerces 1.0 to an int),
just to make sure that a Pentium will act on any enabled trap that
occurred before it.

If you have in mind just testing flags (and staying away from enabling
HW traps -- and this is the course I recommend), this shouldn't matter
(the sticky status flag is set immediately, it's only triggering the
correspondnig trap that's delayed).  I haven't studied C99 deeply
enough to determine whether it has weasle words allowing traps to be
delayed indefinitely, but that kind of HW-driven compromise is common
in the C standards.

Not to imply that this isn't all dead easy <wink>.