[Python-Dev] Python and the Unicode Character Database

Steven D'Aprano steve at pearwood.info
Tue Nov 30 14:23:22 CET 2010


Stephen J. Turnbull wrote:
> Lennart Regebro writes:
> 
>  > *I* think it is more important. In python 3, you can never ever assume
>  > anything is ASCII any more.
> 
> Sure you can.  In Python program text, all keywords will be ASCII
> (English, even, though it may be en_NL.UTF-8<wink>) for the forseeable
> future.
> 
> I see no reason not to make a similar promise for numeric literals.  I
> see no good reason to allow compatibility full-width Japanese "ASCII"
> numerals or Arabic cursive numerals in "for i in range(...)" for
> example.

I agree with you that numeric *literals* should be restricted to the 
ASCII digits. I don't think anyone here is arguing differently -- if 
they are, they should speak up and try to make the case for allowing 
numeric literals in arbitrary scripts. Python doesn't currently allow 
non-ASCII numeric literals, and even if such a change were desirable, it 
would run up against the moratorium. So let's just forget the specter of 
code like:

x = math.sqrt(١٢٣٤.٥٦ ** 一.一)

It ain't gonna happen :)


But I think there is a good case for allowing the constructors int, 
float and complex to continue to accept numeric *strings* with non-ASCII 
  digits. The code already exists, there's probably people out there who 
rely on it, and in the absence of any convincing demonstration that the 
existing behaviour is causing widespread difficulty, we should leave 
well-enough alone.

Various people have suggested that there should be a function in the 
locale module that handles numeric string input in non-ASCII digits. 
This is a de facto admission that there are use-cases for taking user 
input like the string '٣' and turning it into the int 3. Python can 
already do this, and has been able to for many years:

[steve at sylar ~]$ python2.4
Python 2.4.6 (#1, Mar 30 2009, 10:08:01)
[GCC 4.1.2 20070925 (Red Hat 4.1.2-27)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
 >>> int(u'٣')
3

It seems to me that there's no need to move this functionality into locale.


-- 
Steven



More information about the Python-Dev mailing list