[Python-ideas] Floating point "closeness" Proposal Outline

Mon Jan 19 21:28:37 CET 2015

On Mon, Jan 19, 2015 at 11:51 AM, Ron Adam <ron3200 at gmail.com> wrote:

> Here is a gist with a sample implementation:
>
>
>> https://gist.github.com/PythonCHB/6e9ef7732a9074d9337a
>>
>
> For the most part I think it looks good.
>
> Boost describes both a week and strong version, but I didn't see why they
> choose the strong version.

Actually, Boost didn't choose, there is a flag you can set to select which
one to use. I don't want to do that, if you really want something special,
write it yourself. I did choose the "strong" version -- it just seemed more
conservative. Also it's what I think Steven chose for his version (though
not with and, but the result is the same)

> I'm guessing that "strong" indicates it has a higher certainty that the
> numbers are within the specified tolerance.
>

exactly.

> By using "and", it excludes the case when one is in the range and the
> other is not.  I think this needs to be in the description.
>

I tried to capture that, but apparently not well ;-) -- I do think a lot of
care needs to be taken with the docs.

But I still think the in common use case, it doesn't matter -- folks want o
know that the values are "approximately equal", so some tolerance (1-e8 or
....), and, in fact, the tolerance itself is likely to be approximate as
well (i.e 8 or 9 decimal digits is fine). WE need to be careful in the docs
so that people can get the tolernace the want if they do really care, but I
suspect that's unlikely to be the case often.

Has anyone in this thread, or in any test code you've seen or written,
provided a tolerance with more than one significant figure? i.e, I always
use 1e-12 or 1e-10, I've never used a tolerance anything like 0.000153427213

As, in the common case, the tolerance is approximate, and usually small,
then it doesn't matter which version we use: string, weak, or declared
which value to scale to. But I prefer a symmetric version, as I suspect hat
will be the least surprising -- it's good to get the same answer every
time, even if it is approximate!

Your function can test for absolute distance, relative distance, or the
> greater of both.  You could have the defaults for both be zero, and raise
> an error if at least one isn't set to something other than zero.
>

good point -- I hadn't really thought about it that way, and despite my
saying that I didn't want an absolute tolerance function, I have in fact
provided one -- if you set the relative "tol" to zero and "min_tol" > zero,
you get an absolute tolerance test.

I prefer to have a useful default, rather than requireing every use to
specify it (though there is something to be said for making people think
abou tit!), but you are right, they should probalby be renames somethign
like:

rel_tol and abs_tol, and the dics can make it clear that if you specify
both as greater tha zero, it will return True if either one is satisfied.

I hope we can come to some consensus that something like this is the way to
>> go.
>>
>
> Good examples will help with this.  It may also help with choosing a good
> name.
>

you mean use-case examples? rather than specific value examples?

To me, the strong version is an "is-good" test, and the weak version is an
> "is-close" test.  I think it could be important to some people.
>
> I like the idea of being able to use these as a teaching tool to
> demonstrate how our ideas of closeness, equality, and inequality can be
> subjective.
>

Are you suggesting that we allow a flag for the user to set to choose
whether ot use weak or string version? I'd rather not -- I see this is a
practical, works most of the time thing, not a teaching tool, or a
"provides every use case" tool.

> There are two cases...
>
> 1: (The weak version is require for this to work.)
>
> Two numbers are definitely not equivalent if they are further apart than
> the largest error amount.  (The larger number better indicates the
> largeness of the the possible relative error.)
>
> And two numbers are close if you can't determine if they are equivalent,
> or not-equivalent with certainty.*
>
> (* "close numbers" may include equivalent numbers if you define it as a
> set of all definitely not-equivalent numbers.)
>
> 2: (The strong version is required for this to work.)
>
> A value is good if it's within a valid range with certainty.  It is less
> than the smaller relative range of either number.  The smaller number
> better indicates the magnitude of smallness.
>
> So case 1 should be used to test for errors, and case 2 should be used to
> test for valid ranges.
>
> It seems you have the 2nd case in mind, and that's fine.  Some of us where
> thinking of the first case, and possibly switching from one to the other
> during the discussion which is probably why it got confusing or repetitious
> at some points.
>

yes, I suppose I do -- and again, in the common use case, where the
tolerance is also approximate, it really doesn't matter.

> I think both of these are useful, but you definitely need to be clear
> which one you are implementing, and to document it clearly.

yup.

Nice to know at least two of us seem to be coming to consensus ;-)

-- 

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R            (206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115       (206) 526-6317   main reception

Chris.Barker at noaa.gov
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20150119/8e6be5a8/attachment.html>