Significant digits in a float?

Mon Apr 28 22:34:07 EDT 2014

On Mon, 28 Apr 2014 12:00:23 -0400, Roy Smith wrote:

[...]
> Fundamentally, these numbers have between 0 and 4 decimal digits of
> precision, 

I'm surprised that you have a source of data with variable precision, 
especially one that varies by a factor of TEN THOUSAND. The difference 
between 0 and 4 decimal digits is equivalent to measuring some lengths to 
the nearest metre, some to the nearest centimetre, and some to the 
nearest 0.1 of a millimetre. That's very unusual and I don't know what 
justification you have for combining such a mix of data sources.

One possible interpretation of your post is that you have a source of 
floats, where all the numbers are actually measured to the same 
precision, and you've simply misinterpreted the fact that some of them 
look like they have less precision. Since you indicate that 4 decimal 
digits is the maximum, I'm going with 4 decimal digits. So if your data 
includes the float 23.5, that's 23.5 measured to a precision of four 
decimal places (that is, it's 23.5000, not 23.5001 or 23.4999).

On the other hand, if you're getting your values as *strings*, that's 
another story. If you can trust the strings, they'll tell you how many 
decimal places: "23.5" is only one decimal place, "23.5000" is four.

But then what to make of your later example?

> 40.75280000000001 ==> 4

Python floats (C doubles) are quite capable of distinguishing between 
40.7528 and 40.75280000000001. They are distinct numbers:

py> 40.75280000000001 - 40.7528
7.105427357601002e-15

so if a number is recorded as 40.75280000000001 presumably it is because 
it was measured as 40.75280000000001. (How that precision can be 
justified, I don't know! Does it come from the Large Hadron Collider?) If 
it were intended to be 40.7528, I expect it would have be recorded as 
40.7528. What reason do you have to think that something recorded to 14 
decimal places was only intended to have been recorded to 4?

Without knowing more about how your data is generated, I can't advise you 
much, but the whole scenario as you have described it makes me think that 
*somebody* is doing something wrong. Perhaps you need to explain why 
you're doing this, as it seems numerically broken.

> Is there any clean way to do that?  The best I've come up with so far is
> to str() them and parse the remaining string to see how many digits it
> put after the decimal point.

I really think you need to go back to the source. Trying to infer the 
precision of the measurements from the accident of the string formatting 
seems pretty dubious to me.

But I suppose if you wanted to infer the number of digits after the 
decimal place, excluding trailing zeroes (why, I do not understand), up 
to a maximum of four digits, then you could do:

s = "%.4f" % number  # rounds to four decimal places
s = s.rstrip("0")  # ignore trailing zeroes, whether significant or not
count = len(s.split(".")[1])

Assuming all the numbers fit in the range where they are shown in non-
exponential format. If you have to handle numbers like 1.23e19 as well, 
you'll have to parse the string more carefully. (Keep in mind that most 
floats above a certain size are all integer-valued.)

> The numbers are given to me as Python floats; I have no control over
> that.

If that's the case, what makes you think that two floats from the same 
data set were measured to different precision? Given that you don't see 
strings, only floats, I would say that your problem is unsolvable. 
Whether I measure something to one decimal place and get 23.5, or four 
decimal places and get 23.5000, the float you see will be the same.

Perhaps you ought to be using Decimal rather than float. Floats have a 
fixed precision, while Decimals can be configured. Then the right way to 
answer your question is to inspect the number:

py> from decimal import Decimal as D
py> x = D("23.5000")
py> x.as_tuple()
DecimalTuple(sign=0, digits=(2, 3, 5, 0, 0, 0), exponent=-4)

The number of decimal digits precision is -exponent.

> I'm willing to accept that fact that I won't be able to differentiate
> between float("38.0") and float("38.0000").  Both of those map to 1,
> which is OK for my purposes.

That seems... well, "bizarre and wrong" are the only words that come to 
mind. If I were recording data as "38.0000" and you told me I had 
measured it to only one decimal place accuracy, I wouldn't be too 
pleased. Maybe if I understood the context better?

How about 38.12 and 38.1200?

By the way, you contradict yourself here. Earlier, you described 38.0 as 
having zero decimal places (which is wrong). Here you describe it as 
having one, which is correct, and then in a later post you describe it as 
having zero decimal places again.

-- 
Steven D'Aprano
http://import-that.dreamwidth.org/