[Python-Dev] Python and the Unicode Character Database

Mon Nov 29 00:23:01 CET 2010

2010/11/28 M.-A. Lemburg <mal at egenix.com>:
>
>
> "Martin v. Löwis" wrote:
>>>>>>> float('١٢٣٤.٥٦')
>>>> 1234.56
>>
>> I think it's a bug that this works. The definition of the float builtin says
>>
>> Convert a string or a number to floating point. If the argument is a
>> string, it must contain a possibly signed decimal or floating point
>> number, possibly embedded in whitespace. The argument may also be
>> '[+|-]nan' or '[+|-]inf'.
>>
>> Now, one may wonder what precisely a "possibly signed floating point
>> number" is, but most likely, this refers to
>>
>> floatnumber   ::=  pointfloat | exponentfloat
>> pointfloat    ::=  [intpart] fraction | intpart "."
>> exponentfloat ::=  (intpart | pointfloat) exponent
>> intpart       ::=  digit+
>> fraction      ::=  "." digit+
>> exponent      ::=  ("e" | "E") ["+" | "-"] digit+
>> digit          ::=  "0"..."9"
>
> I don't see why the language spec should limit the wealth of number
> formats supported by float().
>
> It is not uncommon for Asians and other non-Latin script users to
> use their own native script symbols for numbers. Just because these
> digits may look strange to someone doesn't mean that they are
> meaningless or should be discarded.

That's different. Python doesn't assign any semantic meaning to the
characters in identifiers. The non-latin support for numerals, though,
could change the meaning of a program dramatically and needs to be
well-specified. Whether int() should do this is debatable. I, for one,
think this kind of support belongs in the locale module.

-- 
Regards,
Benjamin