Explanation of this Python language feature? [x for x in x for x in x] (to flatten a nested list)

Steven D'Aprano steve+comp.lang.python at pearwood.info
Fri Apr 4 20:20:38 EDT 2014


On Fri, 04 Apr 2014 11:01:48 -0600, Ian Kelly wrote:

> On Fri, Apr 4, 2014 at 4:08 AM, Steven D'Aprano
> <steve+comp.lang.python at pearwood.info> wrote:
>> On Fri, 04 Apr 2014 02:13:13 -0600, Ian Kelly wrote:
>>
>>> On Fri, Apr 4, 2014 at 1:52 AM, Steven D'Aprano
>>> <steve+comp.lang.python at pearwood.info> wrote:
>>>> py> from decimal import *
>>>> py> getcontext().prec = 16
>>>> py> x = Decimal("0.7777777777787516") py> y =
>>>> Decimal("0.7777777777787518") py> (x + y) / 2
>>>> Decimal('0.7777777777787515')
>>>>
>>>> "Guido, why can't Python do maths???"
>>>
>>> Well, you need to work within the system:
>>>
>>>>>> (5*x + 5*y) / 10
>>> Decimal('0.7777777777787517')
>>>
>>> Actually, I have no idea whether that formula can be relied upon or
>>> the correctness of the above was just luck.
>>
>>
>> And what happens when x+y would have been calculated correctly, but
>> one, or both, of 5*x or 5*y loses catastrophically loses accuracy due
>> to overflow?
>>
>> py> x = 3.1e307
>> py> y = 3.3e307
>> py> (x+y)/2
>> 3.2e+307
>> py> (5*x+5*y)/10
>> inf
>>
>> (I've used regular floats here out of laziness, the same principle
>> applies to Decimals -- there will be *some* number x which is finite,
>> but 5*x overflows to infinity.)
> 
> I thought that Decimals had arbitrary-precision exponents, at least in
> the pure Python version.  Turns out that's wrong; although the
> context.Emax can be set to any int in the pure Python version, it can't
> be removed entirely.  One could just temporarily upgrade the Emax for
> the above calculation, but the pure Python version was made inconvenient
> to use voluntarily in CPython 3.3, and the C version has strict limits.
> 
> In any event, the exponent limits for decimals are much higher than for
> floats (the default Emax is 999999, and it can be set roughly within the
> limits of C integer precision), so any case where you'll get overflow
> with a Decimal is already far beyond the point where you'd have gotten
> overflow with a float.

If you write the most obvious code that works -- usually a good thing in 
computing, but not always so in numeric computing -- and calculate 
(x+y)/2, with Decimal, most of the time the result will be correct but 
sometimes it won't be. In addition to that problem, if the addition 
overflows you will get an inf instead of the correct answer.

But if you write the "tricky" code (5*x+5*y)/10, you may eliminate the 
failure cases, but at the cost that while it will still overflow to inf 
in the same cases as before, now it will also overflow in a whole lot of 
cases that *would have worked correctly* if only you had written the 
obvious code. There's no free lunch here.

You might argue that this doesn't matter, since the problem values for x 
and y have been moved from "scattered values everywhere" to only(?) 
"values so huge that surely nobody will care". Apart, of course, those 
people who do care. Still, I'm sympathetic to the idea that swapping 
"average doesn't work right" for "really humongous numbers overflow 
sooner than they otherwise would have" counts as a win.

But of course, neither of us know what other problems this "tricky" code 
will introduce. Perhaps it fixes the failures for *some* values of x and 
y, but introduces a different set of failures for *other* values of x and 
y. I'm not quite good enough at numeric analysis to rule that out, 
although I haven't tried very hard. (The division by 10 is obviously just 
a shift, so its error is at most 0.5 ULP. But I'm not sure about the 
multiplications by 5.)

But all of this is missing the really important point. Isn't the 
(supposed) move to Decimal-by-default supposed to *simplify* numeric 
calculations, not complicate them? Do we really expect the average non-
expert to write (5*x+5*y)/10 instead of the obvious (x+y)/2?

There's no doubt that binary floating point calculations are tricky 
beasts, and while IEEE-754 semantics mean that they just about always do 
the right thing, still, serious numeric work is half advanced mathematics 
and half voodoo. Base-10 floats don't improve that situation, they make 
it worse.


-- 
Steven D'Aprano
http://import-that.dreamwidth.org/



More information about the Python-list mailing list