Builtin Float Epsilon? (was: Re: Does python suck or I am just stupid? )

Sat Feb 22 21:25:20 EST 2003

On Saturday 22 February 2003 17:19, Alex Martelli wrote:
> In practice, I think it boils down to: floating-point is hard, and
> there is no "royal road" that will shield programmers who choose
> to use floating-point from undestanding what they're doing, even
> though strange little anomalies will probably keep surfacing.  I
> _think_ it follows, as night follows day, that Python should NOT
> foist floating-point on unsuspecting users who do NOT really know
> what they're doing in the matter (over 90% of us, I fear) -- e.g.,
> true division and decimal literals should map to fixed-point or
> rational types, and floating-point should only be used when it is
> required explicitly.  Unfortunately Guido disagrees (and his
> opinion trumps mine, of course), because "shielding user from
> floating point" was what ABC, Python's precursor language, did,
> and floating point is SO much faster than the alternatives (as it
> can exploit dedicated hardware present in nearly every computer
> of today) that defaulting to non-floating point for non-integer
> numerical calculations might be perceived by naive users as an
> excessive slowing-down of their programs, if said programs perform
> substantial amounts of numeric computation.  Oh well.

Oh well. I've just asked today about fixed point support, and there we are 
with exactly one of the situations where fixed point could have saved the 
day.

I'll wander a little bit now, and I ask everyone to please follow me 
carefully. I'm aware of three ways to represent numbers as the ones given in 
this example: floats (which are broken), rationals (which work better for 
simple fractions, but are more difficult for the user to understand in the 
general case), and fixed point. The third option is the best in my opinion, 
but it does suffer from a few problems. One of the main problems, as pointed 
out by Alex, is a relative lack of speed when compared to hardware supported 
reals. The other one is the fact that floats are the de facto standard for 
most languages in regular use today; therefore, using a different 
representation for numbers will cause lots of confusion.

The first problem - speed - is now less of an issue than some time ago, when 
Pythn was first implemented. With the probable exception of heavy number 
crunching (as in NumPy stuff), I think that the software implementation  of 
fixed point numbers is now quick enough for regular use; I doubt that most 
users would ever notice any difference in speed, but it's still something to 
take care of.

The second issue - floats as a de facto standard - is now much more important, 
because it affects the prospect of using Fixed Point (or decimal) numbers in 
a number of ways:

1) the semantics of fixed point arithmetics may cause some surprises, because 
most programmers today will expect something such as floats, and may be 
surprised by the lack of automatic scaling.

2) it makes difficult to choose a representation for fixed point numbers that 
can be naturally used in a program as a literal. Current implementations ask 
for a string to be passed to a special constructor. The problem is that the 
most natural representation is already that one used by a float. One has to 
come up with a reasonable modifier to allow fixed point literals to be 
directly specified, without the need for a special constructor (more on this 
later).

3) mixing up numbers of different scales presents a number of issues regarding 
the precision of the results. Depending on the situation, the programmer may 
be expecting slightly different semantics. Howeverm a lot of work was done on 
this subject, and we don't need to start from the scratch.

All the problems above have been discussed for a long time not only at c.l.p. 
but also elsewhere, and a lot of material does exist dealing with these 
issues. This is what leads me to believe that the main problem lies on the 
item (2) of the list mentioned above - how to naturally represent fixed point 
literals in a program (not only in Python but on any given language). This 
representation has to be:

- natural, allowing the programmer to both read and write code and immediately 
know that a particular literal value is a fixed point number.

- unambiguous, in such a way that a fixed point number could never be mistaken 
for a float.

- optional, leaving for the user the option to specify standard floats or 
fixed point numbers, depending on the situation.

Unfortunately, any proposal that meet all requirements will need some special 
syntax - and that's really HARD to do, because it will surely get a lot of 
resistance, with good reason.

But as I said above, I'm just wandering and talking about random thoughts... 
so let us keep traveling down this road. I'm myself relatively convinced of 
the need for fixed point number, and also that the only viable implementation 
needs direct support from the language, including special syntax for fixed 
point literals. Now we have few options to explore:

2.1. Represent fixed point numbers using some modifier in the same way it is 
already done with strings (raw and unicode modifiers, for example). Some 
possibilites are:

   -->  3.1416f4 represents the number 3.1416, with precision 4
   -->  1f4         represents the number 1.000, with precision 4

There are some exceptional situations to handle:

   --> 1.02f1     
     option a) round to 1.0 with precision 1
     option b) raise an exception
     [I sincerely don't know which one is better]

The advantage of this notation is that it does explore the existing support 
for scientific (or exponential) notation (for example, as in "1.0e+6"), and 
it is therefore easy to parse.

2.2. Using another symbol for the decimal point - for example, the underscore, 
as in the examples below:

   -->  3_1416 represents the number 3.1416, with precision 4

I really like this notation; it has some big advantages, but also a few 
problems. It allows for easier reading (in my opinion), and I think most 
users would adapt to it pretty quickly. But it forces the user to specify all 
the zeroes in the decimal part, which may or not be a good idea. For example, 
if you are writing literals of very high precision, it's relatively easy to 
write the wrong number of zeroes, which may cause errors later when doing 
arithmetics. But then, this is not a common situation anyway, and I'm not 
sure if this is a real cause for concern.

Independent of the representation chosen (as in __repr__), there is still the 
problem of the formatting for printing purposes (as in __str__). For all 
purposes, the final representation of floats and fixed point numbers will be 
similar - after all, normal people is going to read the numbers, and so we 
must use the standard notation.

Now that I'm almost over... oh well. I've just opened another can of worms. 
Can someone please help me close this one? ;-)

Carlos Ribeiro
cribeiro at mail.inet.com.br