Unexpected behaviour of math.floor, round and int functions (rounding)

Chris Angelico rosuav at gmail.com
Sat Nov 20 18:22:38 EST 2021


On Sun, Nov 21, 2021 at 10:01 AM Avi Gross via Python-list
<python-list at python.org> wrote:
> Computers generally use finite methods, sometimes too finite. Yes, the
> problem is not Mathematics as a field. It is how humans often generalize or
> analogize from one area into something a bit different. I do not agree with
> any suggestion that a series of bits that encodes a result that is rounded
> or truncated is CORRECT. A representation of 0.3 in a binary version of some
> floating point format is not technically correct. Storing it as 3/10 and
> carefully later multiplying it by 20 and then carefully canceling part will
> result in exactly 6. While storing it digitally and then multiplying it in
> registers or whatever by 20 may get a result slightly different than the
> storage representation of 6.0000000000... and that is a fact and risk we
> generally are willing to take.

Do you accept that storing the floating point value 1/4, then
multiplying by 20, will give precisely 5? Because that is
*guaranteed*. You don't have to expect a result "slightly different"
from 5, it will be absolutely exactly five:

>>> (1/4) * 20 == 5.0
True

This is what I'm talking about. Some numbers can be represented
perfectly, others can't. If you try to represent the square root of
two as a decimal number, then multiply it by itself, you won't get
back precisely 2, because you can't have written out the *exact*
square root of two. But you most certainly CAN write "1.875" on a
piece of paper, and it really truly does exactly mean fifteen eighths.
And you can write that number as a binary float, too, and it'll mean
the exact same value.

> But consider a different example. If I have a filesystem or URL or anything
> that does not care about whether parts are in upper or lower case, then
> "filename" and "FILENAME" and many variations like "fIlEnAmE" are all
> assumed to mean the same thing. A program may even simply store all of them
> in the same way as all uppercase. But when you ask to compare two versions
> with a function where case matters, they all test as unequal! So there are
> ways to ask for a comparison that is approximately equal given the
> constraints that case does not matter:

A URL has distinct parts to it: the domain has some precise folding
done (most notably case folding), the path does not, and you can
consider "http://example.com:80/foo" to be the same as
"http://example.com/foo" because 80 is the default port.

> >>> alpha="Hello"
> >>> beta="hELLO"
> >>> alpha == beta
> False
> >>> alpha.lower() == beta.lower()
> True
>

That's a terrible way to compare URLs, because it's both too sloppy
AND too strict at the same time. But if you have a URL representation
tool, it should be able to consider two things equal.

Floats are representations of numbers that can be compared for
equality if they truly represent the same number. The value 3/6 is
precisely equal to the value 7/14:

>>> 3/6 == 7/14
True

You don't need an "approximately equal" function here. They are the
same value. They are equal.

> I see no reason why a comparison canot be done like this in cases you are
> concerned with small errors creeping in:
>
> >>> from math import isclose
> >>> isclose(1, .9999999999999999999999)
> True
> >>> isclose(1, .9999999999)
> True
> >>> isclose(1, .999)
> False

This is exactly the problem though: HOW close counts as equal? The
only way to answer that question is to know the accuracy of your
inputs, and the operations done.

> So floats by themselves are not inaccurate but realistically the results of
> operations ARE. I mean if I ask a long number to be stored that does not
> fully fit, it is often silently truncated and what the storage location now
> represent accurately is not my number but the shorter version that is at the
> limit of tolerance. But consider another analogy often encountered in
> mathematics.

Not true. Operations are often perfectly accurate.

> If I measure several numbers in the real world such as weight and height and
> temperature and so on, some are considered accurate only to a limited number
> of digits. Your weight on a standard digital scale may well be 189.8 but if
> I add a feather or subtract one, the reading may well shift to one unit up
> or down. Heck, the same person measured just minutes later may shift. If I
> used a deluxe scale that measures to more decimal places, it may get hard to
> get the exact same number twice in a row as just taking a deeper breath may
> make a change.
>
> So what happens if I measure a box in three dimensions to the nearest .1
> inch and decide it is 10.1 by 20.2 by 30.3 inches? What is the volume,
> ignoring pesky details about the width of the cardboard or whatever?
>
> A straightforward multiplication yields 4141.606 cubic inches. You may have
> been told to round that to something like 4141.6 because the potential error
> in each measure cannot result in more precision. In reality, you might even
> calculate two sets of numbers assuming the true width may have been a tad
> more or less and come up with the volume being BETWEEN a somewhat smaller
> number and a somewhat larger number.

If those initial figures were accurate to three digits, you should
round it to 4140 cubic inches, because that's all the accuracy you
have. (Or, if you prefer, 4140 +/- 5.)

> I claim a similar issue plagues using a computer to deal with stored
> numbers, perhaps not stored 100% perfectly as discussed, and doing
> calculations. The result often comes out more precisely than warranted. I
> suspect there are modules out there that might do multi-step calculations
> where at each step, numbers generated with extra precision are throttled
> back so the extra precision is set to zeroes after rounding to avoid the
> small increments adding up. Others may just do the calculations and keep
> track and remove extra precision at the end.

When your input values aren't accurate, your output won't be accurate.
That's something the computer can never know. When you store the
number 3602879701896397/36028797018963968, did you actually mean that
number, or did you mean some other number that's kinda close to it? If
you don't tell the computer, it's going to assume that you wanted
exactly that number.

> And again, this is not because the implementation of numbers is in any way
> wrong but because a real-world situation requires the humans to sort of dial
> back how they are used and not over-reach.
>
> So comparing for close-enough inequality is not necessarily a reflection on
> floats but on the design not accommodating the precision needed or perhaps
> on the algorithm used not necessarily being expected to reach a certain
> level.

And close-enough equality is the correct thing to do when you know
exactly what the accuracy of your inputs is. If you need to be
completely rigorous about it, you'd have to store every number as a
range (so you might say that your input length is "10.05 to 10.15" or
"10.1, error 0.5") and do all arithmetic on those ranges. What you'd
find is that some operations widen the ranges and others don't. The
trouble is, that's not actually all that useful; Fermi estimates are
far more accurate than they seem like they "should be" because the
balance of probability is in favour of errors cancelling out, at least
partially.

ChrisA


More information about the Python-list mailing list