[Python-Dev] Python and the Unicode Character Database

"Martin v. Löwis" martin at v.loewis.de
Fri Dec 3 00:19:20 CET 2010


Am 02.12.2010 23:43, schrieb M.-A. Lemburg:
> Eric Smith wrote:
>>> The current behavior should go nowhere; it is not useful. Something very
>>> similar to the current behavior (but done correctly) should go into the
>>> locale module.
>>
>> I agree with everything Martin says here. I think the basic premise is:
>> you won't find strings "in the wild" that use non-ASCII digits but do
>> use the ASCII dot as a decimal point. And that's what float() is looking
>> for. (And that doesn't even begin to address what it expects for an
>> exponent 'e'.)
> 
> http://en.wikipedia.org/wiki/Decimal_mark
> 
> "In China, comma and space are used to mark digit groups because dot is used as decimal mark."

I may be misinterpreting that, but I think that refers to the case of
writing numbers using Arabic digits.

"Chinese" digits are, e.g., used in the Suzhou numerals

http://en.wikipedia.org/wiki/Suzhou_numerals

This doesn't have a decimal point at all. Instead, the second line
(below or left to the actual digits) describes the power of ten and
the unit of measurement (i.e. similar to scientific notation,
but with ideographs for the powers of ten).

In another writing system, they use 点 (U+70B9) as the decimal
separator, see

http://en.wikipedia.org/wiki/Chinese_numerals#Fractional_values

In the same system, the integral part uses multipliers, i.e.
12345 is [1][10000][2][1000][3][100][4][10][5]; the fractional
part uses regular digits.

Regards,
Martin



More information about the Python-Dev mailing list