[Python-ideas] [Python-Dev] Unicode minus sign in numeric conversions

Sun Jun 9 13:38:25 CEST 2013

On 09/06/13 15:30, Guido van Rossum wrote:
> I'm beginning to feel that it was even a mistake to accept all those
> other Unicode decimal digits, because it leads to the mistaken belief
> that one can parse a number without knowing the locale.

You can parse unsigned integers, at least, without knowing the locale.

Decimal digits are unambiguously decimal digits, no matter the locale. "๒" \N{THAI DIGIT TWO} does not cease to be a character representing the number two just because you're not in Thailand, just as "2" does not cease to have the same meaning once you enter Thailand. People might not recognise those digits, but they are still unambiguous. Even the order of digits is, as far as I know, always Big Endian (most significant digit on the left) even for right-to-left or bidirectional scripts.

So as I see it, int() should certainly support non-ASCII decimal digits.

float() is a bit trickier, because of course here you do need to know the locale to tell whether . or , or · \N{MIDDLE DOT} is a radix point or an error. And I'm not sure what conventions there are for exponents. But even if float() is not fully locale-compliant, it seems rather silly for it to be more restrictive than int() -- since int('๒') correctly returns 2, I think it is reasonable for float('๒.๑') to return 2.1.

So +1 on the current behaviour for int, float and Decimal: I think they make the right compromises, without being excessively complex or unreasonably restrictive.

As far as supporting non-ASCII plus and minus signs, I'm keen in principle but luke-warm in practice. I think it would be a Nice To Have, and if somebody did the work to identify which characters should be accepted, I'd support adding it as a new feature. But I don't think that the lack of support for non-ASCII numeric signs is a bug.

-- 
Steven