Devanagari int literals [was Re: Should non-security 2.7 bugs be fixed?]

Sun Jul 19 14:07:58 EDT 2015

On Sun, Jul 19, 2015 at 10:56 PM, Tim Chase
<python.list at tim.thechases.com> wrote:
> Agreed that it's pretty awesome.  It seems to have some holes though:
>
> Python 3.4.2 (default, Oct  8 2014, 10:45:20)
> [GCC 4.9.1] on linux
> Type "help", "copyright", "credits" or "license" for more information.
>>>> print('\N{VULGAR FRACTION ONE EIGHTH}')
> ⅛
>>>> print(float('\N{VULGAR FRACTION ONE EIGHTH}'))
> Traceback (most recent call last):
>   File "<stdin>", line 1, in <module>
> ValueError: could not convert string to float: '⅛'
>>>> print('\N{ROMAN NUMERAL NINE}')
> Ⅸ
>>>> int('\N{ROMAN NUMERAL NINE}')
> Traceback (most recent call last):
>   File "<stdin>", line 1, in <module>
> ValueError: invalid literal for int() with base 10: 'Ⅸ'
>>>> print('\N{ROMAN NUMERAL TEN THOUSAND}')
> ↂ
>>>> int('\N{ROMAN NUMERAL TEN THOUSAND}')
> Traceback (most recent call last):
>   File "<stdin>", line 1, in <module>
> ValueError: invalid literal for int() with base 10: 'ↂ'

The int() and float() functions accept, if I'm not mistaken, anything
with Unicode category "Nd" (Number, decimal digit). In your examples,
the fraction (U+215B) is No, and the Roman numerals (U+2168, U+2182)
are Nl, so they're not supported. Adding support for these forms might
be accepted as a feature request, but it's not a bug.

(I may be wrong about the definition being based on category. It may
be based on the "Numeric type" of each character. But again, the
characters that are accepted would be those which have a Digit type,
not merely Numeric, and again, it'd be a feature request to expand
that.)

ChrisA