[Python-ideas] [Python-Dev] Unicode minus sign in numeric conversions

MRAB python at mrabarnett.plus.com
Sun Jun 9 02:45:30 CEST 2013


On 09/06/2013 00:21, David Mertz wrote:
> On Jun 8, 2013, at 3:52 PM, Guido van Rossum wrote:
>> Apologies, Python 3 does actually have limited support for the other
>> Unicode digits (actually only the ones marked "Decimal" IIUC). I'd
>> totally forgotten about that (since I still live primarily in an ASCII
>> world :-). E.g.
>
> This is cool, and I hadn't known about it.  I had just written a toy implementation of my own _float() to show a possible behavior.  Then looking at Guido's post, I find that:
>
>>>> import unicodedata
>>>> x = (
> ...   unicodedata.lookup('ARABIC-INDIC DIGIT ONE')+
> ...   unicodedata.lookup('ARABIC-INDIC DIGIT TWO')+
> ...   unicodedata.lookup('ARABIC-INDIC DIGIT THREE')+
> ...   "."+
> ...   unicodedata.lookup('ARABIC-INDIC DIGIT FOUR')+
> ...   unicodedata.lookup('ARABIC-INDIC DIGIT FIVE'))
>>>> x
> '١٢٣.٤٥'
>>>> float(x)
> 123.45
>
> ... my idea was to add an optional named argument like 'lang="Arabic"', but really it isn't needed since the digits MEAN the same thing in various scripts.  However, this DOES seem a arguably strange as behavior:
>
>>>> x = ('123.'+
> ...   unicodedata.lookup('ARABIC-INDIC DIGIT FOUR')+
> ...   unicodedata.lookup('ARABIC-INDIC DIGIT FIVE'))
>>>> x
> '123.٤٥'
>>>> float(x)
> 123.45
>
> Not wrong, but possibly surprising.
>
FYI, you don't need to use 'unicodedata.lookup':

 >>> import unicodedata
 >>> '\N{ARABIC-INDIC DIGIT ONE}' == unicodedata.lookup('ARABIC-INDIC 
DIGIT ONE')
True



More information about the Python-ideas mailing list