Curious Omission In New-Style Formats

Mon Jul 11 23:38:28 EDT 2016

On Tue, 12 Jul 2016 04:38 am, Ian Kelly wrote:

> In what way do the leading zeroes in "00123" add to the precision of
> the number? 00123 is the same quantity as 123 and represents no more
> precise a measurement. 

You guys... next you're going to tell me that 1.23 and 1.2300 are the same
quantity and represent no more precise a measurement :-) And keep adding
zeroes 1.230000000... and claim infinite precision.

That's not now "precision" works. Its true that *mathematically* leading
zeroes don't change the number, but neither do trailing zeroes.
Nevertheless, we *interpret* them as having meaning, as indicators of the
precision of measurement in the mathematical sense.

In a moment, I'm going to attempt to justify a reasonable meaning for
leading zeroes. But that's actually not relevant. printf (and Python's %
string operator) interprets the precision field in the mathematical sense
for floats, but that's not the only sense possible. The word "precise" has
a number of meanings, including:

- exactness
- accurate
- strict conformity to a rule
- politeness and nicety
- the quality of being reproducible

For integers, printf and % interpret the so-called "precision" field of the
format string not as a measurement precision (number of decimal places),
but as the number of digits to use (which is different from the total field
width). For example:

py> "%10.8x" % 123
'  0000007b'

How anyone can possibly claim this makes no sense is beyond me! Is the
difference between "number of digits" and "number of decimal places" really
so hard to grok? I think not.

And it's clearly useful: with integers, particular if they represent
fixed-size byte quantities, it is very common to use leading zeroes. To a
programmer the hex values 7b, 007b and 0000007b have very different
meanings: the first is a byte, the second may be a short int, and the third
may be a long int.

Why shouldn't we use the "precision" field for this? It doesn't mean what
scientists mean by precision, but so what? That's not important. Scientists
don't usually have to worry about leading zeroes, but programmers do, and
its useful to distinguish between the *total* width (padded with spaces)
and the *number* width (padded with zeroes).

I think that printf and % get this right, and format() gets it wrong.

[...]
>>> If you truly wanted to format the number with a precision
>>> of 5 digits, it would look like this:
>>>
>>>     0x123.00
>>
>> Er, no, because its an integer.
> 
> Which is why if you actually want to do this, you should convert it to
> a decimal or a float first (of course, those don't support hexadecimal
> output, so if you actually want hexadecimal output *and* digits after
> the (hexa)decimal point, then I think you would just have to roll your
> own formatting at that point).

What? No no no. Didn't you even look at Lawrence's example? He doesn't want
to format the number with decimal places at all.

Converting an integer to a float just to use the precision field is just
wrong. That's like saying that "1234".upper() doesn't make sense because
digits don't have uppercase forms, and if you really want to convert "1234"
to uppercase you should spell it out in words first:

"onetwothreefour".upper()

Earlier, I said that I would attempt to give a justification for leading
zeroes in terms of measurement too. As I said, this is strictly irrelevant,
but I thought it was interesting and so I'm going to share. You can stop
reading now if you prefer.

If you measure something with a metre ruler marked in centimetres, getting
1.23 metres, that's quite different from measuring it with a ruler marked
in tenths of a millimetres and getting 1.2300. That's basic usage for
precision in terms of number of decimal places and I'm sure we all agree
about that.

Now lets go the other way. How to you distinguish between a distance
measured using an unmarked metre stick, giving us an answer of 123 metres,
versus something measured with a 10km ruler(!) with one metre markings?
Obviously with *leading* zeroes rather than trailing zeroes. If I lay out
the one metre stick 123 times, then I write the measurement as 123. If I
lay out my 10km ruler and note the position, I measure:

zero tens of kilometres;
zero kilometres;
three hundreds-of-metres;
two tens-of-metres;
three metres;

giving 00123 metres of course. Simple!

Of course, it's rare that we do that. Why would we bother to distinguish the
two situations? And who the hell has a 10km long measuring stick?

If I'm a scientist, I'll probably write them both as 123m and not care about
the size of the measuring stick, only about the smallest marking on it. At
least for measuring straight-line distances, the size of the measuring
stick generally doesn't matter.

It *does* matter for measuring curves, but paradoxically the bigger the
measuring stick (the more leading zeroes) the worse your measurement is
likely to be. This is the problem of measuring coastlines and is related to
fractal dimension. Suppose I lay my 10km long measuring stick along some
piece of coastline, and measure it as 00123 metres. (It's a *thought
experiment*, don't hassle me about the unfeasibly large stick. Divide
everything by a thousand and call it a 10m stick marked in millimetres if
you like.) Chances are that if I used a 1 metre measuring stick, and
followed the contour of the coast more closely, I'd get a different number.
So the more leading zeroes, the less accurate your measurement is likely to
be. But interesting as this is, for most purposes either we're not
measuring a curve, or we are but pretend we're not and ignore the fractal
dimension.

But as I said, none of this is strictly relevant to the question of
interpreting the "precision" field of format strings as leading zeroes for
integers. That has a common, standard interpretation as "number of digits"
(as opposed to "number of decimal places") and is very useful. I maintain
that format() has made a mistake by forbidding it.

-- 
Steven
“Cheer up,” they said, “things could be worse.” So I cheered up, and sure
enough, things got worse.