Old Man Yells At Cloud

Steve D'Aprano steve+python at pearwood.info
Sun Sep 17 14:35:45 EDT 2017


On Mon, 18 Sep 2017 03:00 am, Chris Angelico wrote:

>> The distinction between Python floats and real numbers ℝ is a red-herring. It
>> isn't relevant.
> 
> You said:
> 
>>>> (I have a degree in maths, and if we ever
>>>> covered areas where int/int was undefined, it was only briefly, and I've
>>>> long since forgotten it.)
> 
> So what do YOU mean by "int/int" being "undefined"? And I referred to
> real numbers because YOU referred to a degree in maths.

Yes, I referred to a maths degree. I referred to lots of things. And real
numbers are still irrelevant. The problem with Python 2 division is not because
of the difference between mathematically pure reals and actual Python floats.
It is because a/b would do two *different* things depending on whether both a
and b were floats or not.

To answer your question, what do I mean by int/int being undefined, I'd have to
dig into areas of maths that either weren't taught in the undergrad courses I
did, or that I've long since forgotten about. Something about... fields? The
reals and the rationals are both fields, and they're closed under division.
Ints *aren't* a field, because the integers aren't closed under division.

("Closed" means that dividing a rational number by another rational number gives
a rational number, for example.)

This is a pretty specialised area of maths. You won't learn anything about it in
high school. And possibly not undergrad maths degrees. I seem to vaguely recall
just barely touching on groups, but not rings or fields. It's perfectly
possibly for somebody to do a post-grad degree in maths and not touch this
stuff. Hell, I don't even know if I'm right to say that int/int division
is "undefined". It's could be that even mathematicians who work in this area
will say "of course its defined, you just don't always get an int". Or that
they'll say "the question isn't even wrong". 

If you read the Wikipedia page on it:

https://en.wikipedia.org/wiki/Algebraic_structure

you'll see reference to category theory, and monoids, things which Haskell loves
and most other people run screaming from.

The bottom line is, before you can sensibly justify making int/int an illegal
operation (a compile-time error in Haskell) you need to be thinking about some
pretty hairy areas of maths. To say that 11/2 cannot be calculated and isn't
5.5, you're thinking in areas that most mathematically educated people don't
even know exist, let alone care about.


[...]
>> This isn't some theoretical problem that might, maybe, perhaps, be an issue
>> for some people sometimes. It was a regular source of actual bugs leading to
>> code silently returning garbage.
> 
> So why doesn't it return a fractions.Fraction instead? That way, you
> still get "one half" instead of zero, but it's guaranteed to be
> accurate. And having 1/3 be a literal meaning "one third" would avoid
> all the problems of "1/3 + 1/3 + 1/3 != 3/3". What is the
> justification for int/int => float and not rational?

(1) Guido doesn't like fractions for the built-in maths operators, because of
his experience with ABC where quite simple calculations would end up with
bloody enormous fractions with millions of digits in both the numerator and
denominator, running slower and slower, for a number where the actual precision
was maybe three or four decimal places.

(2) Fractions didn't exist in the standard library when true division was
introduced.

(3) Fractions are slow to work with. They were even slower until a few years ago
when Python got a C accelerated version. Floats are much faster.

(4) For many purposes, the arbitrary precision of fractions is *spurious*
precision. Like the old saw says:

"Measure with micrometer, mark with chalk, cut with axe."

You're taking physical quantities which are typically measured to a precision of
three or four decimal places, if not less, then doing calculations on them to a
precision of potentially millions of decimal places. There may be a few places
where that is justified, but as the default behaviour, its spurious precision
and such overkill as to be ludicrous.

For most purposes, calculating with C doubles is more precision than you'll ever
need in a lifetime, and for the exceptions, well, if you're dealing with
use-cases that Double isn't enough for, you probably know enough to take
alternative steps.

(5) Most people are used to dealing with floating point numbers, from other
languages, from calculators, from school maths. Floats are quirky but familiar.

(6) Fractions are surprising and hard to deal with. Quickly now, without using a
calculator or Python, how big is this number?

2523720122311461/140737488355328

Is it more or less than 50?

Which would you rather see, the above fraction or its decimal equivalent,
17.93211?


But hey, if some languages (Lisp? Scheme?) want to use rationals instead of
floats as their default numeric type, more power to them.


>> Can you demonstrate any failure of dividing two ints n/m which wouldn't
>> equally fail if you called float(n)/float(m)? I don't believe that there is
>> any such failure mode. Forcing the user to manually coerce to floats doesn't
>> add any protection.
> 
> I don't think there is either, but by forcing people to coerce to
> float, you force them to accept the consequences of floats.

And by implicitly coercing to floats, you also force them to accept the
consequences of floats.


[...]
> Upcasting from one type to a non-superset of that type is problematic.

So you say, but you've failed to demonstrate any practical, real world
consequences that are problematic. You've even agreed that forcing the user to
do their own coercion doesn't add any protection, and that float(n)/float(b)
will fail in exactly the same ways as n/b with true division.

And even if you are right that true division is "problematic", it certainly is
nowhere near as big a problem as the mess that we had when x/y would do
*different operations* according to the types of x and y.

I'm not talking about relatively benign issues with DSLs and custom classes that
defined __div__ and __rdiv__ to do whatever wacky thing it they like, like path
objects that used / to concatenate path elements. I'm talking about two of the
most commonly used numeric types, which people expect to be able to interchange
safely for the purposes of numerical calculations. Most people have two dozen
years of schooling, or more, and potentially many more years of experience with
calculators and the like, teaching them that 1/2 and 1.0/2.0 are the same
thing.

Here's a real practical consequence. You have the price of an item, including
GST, and you want to calculate the ex-GST price:

def ex_GST(inc_GST):
    return 10*inc_GST/11

Looks perfectly reasonable. But it isn't. With true division, it is correct:

py> ex_GST(122)  # with true division
110.9090909090909

which of course you would round to $110.91. But without true division, it
silently fails, and you lose almost a dollar:

py> ex_GST(122)  # with Python 2 integer division
110

(Don't hassle me about using binary floats instead of Decimals. The decimal
module didn't exist yet, and even if it did, sometimes using binary floats is
simply good enough, e.g. when using Python as a calculator.)

The problem is that before the swap to true division, that code *actually* meant
something like this:

def ex_GST(inc_GST):
    if isinstance(inc_GST, int):
        return divmod(10*inc_GST, 11)[0]  # throw away any remainder
    else:
        return 10*inc_GST/11  # include the remainder


which is not what anyone wanted. And because this was a silent error, giving
garbage results instead of a clean exception, this sort of bug could hide deep
in the middle of calculations for a long time.

I got burned by this, many times, as did many other people. It was an awful bug
magnet and a bad design. Compared to that, even Haskell's "purity over
practicality" decision to ban integer / operator completely is sensible.



-- 
Steve
“Cheer up,” they said, “things could be worse.” So I cheered up, and sure
enough, things got worse.




More information about the Python-list mailing list