Devanagari int literals [was Re: Should non-security 2.7 bugs be fixed?]

MRAB python at mrabarnett.plus.com
Sun Jul 19 18:13:48 EDT 2015


On 2015-07-19 22:16, Chris Angelico wrote:
> On Mon, Jul 20, 2015 at 5:55 AM, Tim Chase
> <python.list at tim.thechases.com> wrote:
>> On 2015-07-20 04:07, Chris Angelico wrote:
>>> The int() and float() functions accept, if I'm not mistaken,
>>> anything with Unicode category "Nd" (Number, decimal digit). In
>>> your examples, the fraction (U+215B) is No, and the Roman numerals
>>> (U+2168, U+2182) are Nl, so they're not supported. Adding support
>>> for these forms might be accepted as a feature request, but it's
>>> not a bug.
>>
>> Ah, that makes sense.  Some simple testing (thanks, unicodedata
>> module) supports your conjecture.
>>
>> It's not a particularly big deal so not really worth the brain-cycles
>> to add support for them.  Just upon hearing "Python's int() does
>> smart things with Unicode characters", those were some of my first
>> characters to try.  The failure struck me as odd until you explained
>> the simple difference.
>
> The other part of the problem is: What should float("2⅛3") be? Should
> it be equal to 21.0/83.0? Should the first part be parsed as a classic
> mixed number (2 + 1/8), and then what should the 3 mean? While it's
> easy to see what an individual character should represent (just check
> unicodedata.numeric(ch) - for ⅛ it's 0.125), the true meaning of a
> string of such characters is less than clear. Similarly, Roman
> numerals aren't meant to be used after the decimal point, so "Ⅸ.Ⅴ"
> does not normally mean nine and a half... not to mention the confusing
> situation that "ⅠⅤ" would naively parse as 15 but "Ⅳ" is definitely 4.
> Since these kinds of complexities exist, it's safest to reserve this
> level of parsing for a special-purpose function. If someone can come
> up with a really strong argument for the float() and int()
> constructors interpreting these, I'd expect to see it deployed as a
> third-party module first, before being pointed out as "see, you can
> use float() for all these, but if you want to use those, you should
> use Float() instead". (Incidentally, I fully expect to see, some day,
> pytz.localize() semantics brought into the standard library
> datetime.datetime class, for precisely this reason.)
>
> Unicode is awesome, but it's not a panacea :)
>
What's the result of, say, float('1e.3')?

It raises an exception.

So float("2⅛3") should also raise an exception.




More information about the Python-list mailing list