Unexpected behaviour of math.floor, round and int functions (rounding)

Avi Gross avigross at verizon.net
Sat Nov 20 17:59:43 EST 2021


Chris,

I generally agree with your comments albeit I might take a different slant.

What I meant is that people who learn mathematics (as I and many here
obviously did) can come away with idealized ideas that they then expect to
be replicable everywhere. But there are grey lines along the way where some
mathematical proofs do weird things like IGNORE parts of a calculation by
suggesting they are going to zero much faster than other parts and then wave
a mathematical wand about what happens when they approach a limit like zero
and voila, we just "proved" that the derivative of X**2 is 2*X or the more
general derivative of A*(X**N) is N*A*(X**(N-1)) and then extend that to N
being negative or fractional or a transcendental number and beyond.

Computers generally use finite methods, sometimes too finite. Yes, the
problem is not Mathematics as a field. It is how humans often generalize or
analogize from one area into something a bit different. I do not agree with
any suggestion that a series of bits that encodes a result that is rounded
or truncated is CORRECT. A representation of 0.3 in a binary version of some
floating point format is not technically correct. Storing it as 3/10 and
carefully later multiplying it by 20 and then carefully canceling part will
result in exactly 6. While storing it digitally and then multiplying it in
registers or whatever by 20 may get a result slightly different than the
storage representation of 6.0000000000... and that is a fact and risk we
generally are willing to take.

But consider a different example. If I have a filesystem or URL or anything
that does not care about whether parts are in upper or lower case, then
"filename" and "FILENAME" and many variations like "fIlEnAmE" are all
assumed to mean the same thing. A program may even simply store all of them
in the same way as all uppercase. But when you ask to compare two versions
with a function where case matters, they all test as unequal! So there are
ways to ask for a comparison that is approximately equal given the
constraints that case does not matter:

>>> alpha="Hello"
>>> beta="hELLO"
>>> alpha == beta
False
>>> alpha.lower() == beta.lower()
True

I see no reason why a comparison canot be done like this in cases you are
concerned with small errors creeping in:

>>> from math import isclose
>>> isclose(1, .9999999999999999999999)
True
>>> isclose(1, .9999999999)
True
>>> isclose(1, .999)
False

I will agree with you that binary is not any more imprecise than base 10.
Computer hardware is much easier to design though that works with binary. 

So floats by themselves are not inaccurate but realistically the results of
operations ARE. I mean if I ask a long number to be stored that does not
fully fit, it is often silently truncated and what the storage location now
represent accurately is not my number but the shorter version that is at the
limit of tolerance. But consider another analogy often encountered in
mathematics.

If I measure several numbers in the real world such as weight and height and
temperature and so on, some are considered accurate only to a limited number
of digits. Your weight on a standard digital scale may well be 189.8 but if
I add a feather or subtract one, the reading may well shift to one unit up
or down. Heck, the same person measured just minutes later may shift. If I
used a deluxe scale that measures to more decimal places, it may get hard to
get the exact same number twice in a row as just taking a deeper breath may
make a change. 

So what happens if I measure a box in three dimensions to the nearest .1
inch and decide it is 10.1 by 20.2 by 30.3 inches? What is the volume,
ignoring pesky details about the width of the cardboard or whatever?

A straightforward multiplication yields 4141.606 cubic inches. You may have
been told to round that to something like 4141.6 because the potential error
in each measure cannot result in more precision. In reality, you might even
calculate two sets of numbers assuming the true width may have been a tad
more or less and come up with the volume being BETWEEN a somewhat smaller
number and a somewhat larger number.

I claim a similar issue plagues using a computer to deal with stored
numbers, perhaps not stored 100% perfectly as discussed, and doing
calculations. The result often comes out more precisely than warranted. I
suspect there are modules out there that might do multi-step calculations
where at each step, numbers generated with extra precision are throttled
back so the extra precision is set to zeroes after rounding to avoid the
small increments adding up. Others may just do the calculations and keep
track and remove extra precision at the end.

And again, this is not because the implementation of numbers is in any way
wrong but because a real-world situation requires the humans to sort of dial
back how they are used and not over-reach.

So comparing for close-enough inequality is not necessarily a reflection on
floats but on the design not accommodating the precision needed or perhaps
on the algorithm used not necessarily being expected to reach a certain
level.



-----Original Message-----
From: Python-list <python-list-bounces+avigross=verizon.net at python.org> On
Behalf Of Chris Angelico
Sent: Saturday, November 20, 2021 5:17 PM
To: python-list at python.org
Subject: Re: Unexpected behaviour of math.floor, round and int functions
(rounding)

On Sun, Nov 21, 2021 at 8:32 AM Avi Gross via Python-list
<python-list at python.org> wrote:
>
> This discussion gets tiresome for some.
>
> Mathematics is a pristine world that is NOT the real world. It handles 
> near-infinities fairly gracefully but many things in the real world 
> break down because our reality is not infinitely divisible and some 
> parts are neither contiguous nor fixed but in some sense wavy and 
> probabilistic or worse.

But the purity of mathematics isn't the problem. The problem is people's
expectations around computers. (The problem is ALWAYS people's
expectations.)

> So in any computer, or computer language, we have realities to deal 
> with when someone asks for say the square root of 2 or other 
> transcendental numbers like pi or e or things like the sin(x) as often 
> they are numbers which in decimal require an infinite number of digits 
> and in many cases do not repeat. Something as simple as the fractions 
> for 1/7, in decimal, has an interesting repeating pattern but is otherwise
infinite.
>
> .142857142857142857 ... ->> 1/7
> .285714285714285714 ... ->> 2/7
> .428571 ...
> .571428 ...
> .714285 ...
> .857142 ...
>
> No matter how many bits you set aside, you cannot capture such numbers 
> exactly IN BASE 10.

Right, and people understand this. Yet as soon as you switch from base
10 to base 2, it becomes impossible for people to understand that 1/5 now
becomes the exact same thing: an infinitely repeating expansion for the
rational number.

> You may be able to capture some such things in another base but then 
> yet others cannot be seen in various other bases. I suspect someone 
> has considered a data type that stores results in arbitrary bases and 
> delays evaluation as late as possible, but even those cannot handle many
numbers.

More likely it would just store rationals as rationals - or, in other words,
fractions.Fraction().

> So the reality is that most computer programming is ultimately BINARY 
> as in BASE 2. At some level almost anything is rounded and imprecise. 
> About all we want to guarantee is that any rounding or truncation done 
> is as consistent as possible so every time you ask for pi or the 
> square root of 2, you get the same result stored as bits. BUT if you 
> ask a slightly different question, why expect the same results? 
> sqrt(2) operates on the number 2. But
> sqrt(6*(1/3)) first evaluates 1/3 and stores it as bits then 
> multiplies it by the bit representation of 6 and stores a result which 
> then is handed to
> sqrt() and if the bits are not identical, there is no guarantee that 
> the result is identical.

This is what I take issue with. Binary doesn't mean "rounded and imprecise".
It means "base two". People get stroppy at a computer's inability to
represent 0.3 correctly, because they think that it should be perfectly
obvious what that value is. Nobody's bothered by
sqrt(2) not being precise, but they're very much bothered by 1/10 not
"working".

> Do note pure Mathematics is just as confusing at times. The number 
> .99999999... where the dot-dot-dot notation means go on forever, is 
> mathematically equivalent to the number 1 as is any infinite series 
> that asymptotically approaches 1 as in
>
>         1/2 + 1/4 + 1/8 + ... + 1/(2**N) + ...
>
> It is not seen by many students how continually appending a 9 can ever 
> be the same as a number like 1.00000 since every single digit is 
> always not a match. But the mathematical theorems about limits are now 
> well understood and in the limit as N approaches infinity, the two 
> come to mean the same thing.

Mathematics is confusing. That's not a problem. To be quite frank, the real
world is far more confusing than the pristine beauty that we have inside a
computer. The problem isn't the difference between reality and mathematics,
or between reality and computers, or anything like that; the problem, as
always, is between people's expectations and what computers do.

Tell me: if a is equal to b and b is equal to c, is a equal to c?
Mathematicians say "of course it is". Engineers say "there's no way you can
rely on that". Computer programmers side with whoever makes most sense right
this instant.

> So, what should be stressed, and often is, is to use tools available 
> that let you compare numbers for being nearly equal.

No. No no no no no. You don't need to use a "nearly equal" comparison just
because floats are "inaccurate". It isn't like that. It's this exact
misinformation that I am trying to fight, because floats are NOT inaccurate.
They're just in binary, same as everything that computers do.

> I note how unamused I was when making a small table in EXCEL (Note, 
> not
> Python) of credit card numbers and balances when I saw the darn credit 
> card numbers were too long and a number like:
>
> 4195032150199578
>
> was displayed by EXCEL as:
>
> 4195032150199570
>
> It looks like I just missed having significant stored digits and EXCEL 
> reconstructed it by filling in a zero for the missing extra. The 
> problem is I had to check balances sometimes and copy/paste generated 
> the wrong number to use. I ended up storing the number as text using 
> '4195032150199578 as I was not doing anything mathematical with it and 
> this allowed me to keep all the digits as text strings can be quite long.
>
> But does this mean EXCEL is useless (albeit some thing so) or that the 
> tool can only be used up to some extent and beyond that, can 
> (silently) mislead you?

Oh, Excel is moronic in plenty of other ways.

https://www.youtube.com/watch?v=yb2zkxHDfUE

> Having said all that, this reminds me a bit about the Y2K issues where 
> somehow nobody thought much about what happens when the year 2000 
> arrives and someone 103 years old becomes 3 again as only the final 
> two digits of the year are stored. We now have the ability to make 
> computers with increased speed and memory and so on and I wonder if 
> anyone has tried to make a standard for say a 256-byte storage for 
> multiple-precision floating point that holds lots more digits of 
> precision as well as allowing truly huge exponents. Of course, it may 
> not be practical to have computers that have registers and circuitry 
> that can multiply two such numbers in a very few cycles, and it may be 
> done in stages in thousands of cycles, so use of something big like that
might not be a good default.
>

Yes, you could use 80-bit floats, 128-bit floats, or 256-bit floats, but
that won't change the fact that 0.3 can't be represented precisely in
binary, nor will it change the fact that 0.5 *can*. If people can't think in
binary, they won't think in binary with more bits either.

ChrisA
--
https://mail.python.org/mailman/listinfo/python-list



More information about the Python-list mailing list