[Python-ideas] float('∞')=float('inf')

Vernon D. Cole vernondcole at gmail.com
Sat Jul 13 08:32:59 CEST 2013


For the benefit of those who read this in ASCII, I will include Unicode
translations in the following. I prefer code which is readable in ASCII (as
PEP-8 suggests) which is one reason that I a little bit dislike the
proposal.  I had to go to the archives to even read the subject line.
Nevertheless, I think that, in the Unicode world, the proposal is sound.

The question was asked earlier why the Python int() and float() functions
do not allow Greek numbers, when they do allow numbers from many other
language character sets.

The answer is in the documentation for int():

> The numeric literals accepted include the digits 0 to 9 or any Unicode
> equivalent (code points with the Nd property).
>
The "Nd" characters are decimal digits of systems which use positional
notation (i.e. Arabic numbers).  The Greeks used decimal numbers, but used
different symbols for one, ten, hundred, thousand, (etc.) and added them
together, much like the system of Roman numbers we are familiar with.

The int() parser expects Arabic formatted numbers, so it will not correctly
interpret other systems of notation.  In order to read such numbers, you
need to use a parser which was built for them.  PEP 313 suggested that a
parser for Roman formatted numbers be included in Python, and it was
rejected.

Several algorithms for reading Roman numbers encoded using ASCII values
['i','v','x','L', (etc.)] have been published.  The one I wrote goes a bit
further -- it also tries to read the value of unicodedata.numeric() for
each character of its input string, and sums them (sort of).  It would,
therefore convert all of the Greek and other characters mentioned in this
thread and return a value for them.  If a Greek author followed Roman
formatting rules it would return a _correct_ value, too. If, on the other
hand, he put a smaller valued digit on the left side of a larger digit, he
would probably not appreciate the resulting subtraction.


> >>> import romanclass as Roman
>
 >>> g2 = '\U0001015c'
> >>> unicodedata.name(g2)
> 'GREEK ACROPHONIC THESPIAN TWO'
> >>> g5000 = '\U00010172'
> >>> unicodedata.name(g5000)
> 'GREEK ACROPHONIC THESPIAN FIVE THOUSAND'
> >>> g5002 = g5000 + g2 #  string concatenation (not addition)
> >>> g5002
> '\U00010172\U0001015c'
> >>> Roman.Roman(g5002)
> Roman(5002)
> >>> print(Roman.Roman(g5002))
> ↁII
> >>> # but -- since Roman math subtracts values on the left...
> >>> print(Roman.Roman(g2 + g5000))
> MↁCMXCVIII
>

This is all an unimportant side effect of my attempt to support actual
Unicode Roman numbers:

> >>> u'\u2167'
> 'Ⅷ'
> >>> eight = Roman.Roman(u'\u2167')
> >>> print(eight + 10)  # NOTE: mathematical addition
> XVIII
>

This all assumes that we are talking about Acrophonic (or Herodian or
Attic) numerals. The Greeks also used Alphabetic (also called Milesian,
Alexandrian, or Ionic) numerals. In that system, the value of pi ('\u03c0')
is 80 (and has nothing to do with the circumference of a circle.)  That
usage, however, is not recognized by Unicode:

> >>> '\u03c0'
> 'π'
> >>> pi = '\u03c0'
> >>> unicodedata.name(pi)
> 'GREEK SMALL LETTER PI'
> >>> unicodedata.numeric(pi)
> Traceback (most recent call last):
>   File "<pyshell#113>", line 1, in <module>
>     unicodedata.numeric(pi)
> ValueError: not a numeric character
> >>>

 [ as a complete side note: Greeks pronounce the name of that letter as
"pea" not "pie".]

That agrees with Unicode's non-recognition of the numeric value of ASCII
letters used in Roman numerals:

>  >>> unicodedata.numeric('X')
> Traceback (most recent call last):
>   File "<pyshell#114>", line 1, in <module>
>     unicodedata.numeric('X')
> ValueError: not a numeric character
> >>>
>

Any numeric usage requires a definition of how the string is to be parsed:

> >>> Roman.Roman('X')
> Roman(10)
> >>> float(Roman.Roman('X'))
> 10.0
> >>>
>

So, forget all of this noise about all of the other possible things that
could be done with extended definitions of float(). Any of those would
require another definition, and another PEP.  This proposal is for only one
thing -- to make the following happen:
>>> inf = '\u221e'
>>> float(inf)
inf
>>>

Mark me as +0
--
Vernon Cole
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20130713/6f42892c/attachment-0001.html>


More information about the Python-ideas mailing list