[Python-Dev] Python and the Unicode Character Database

Wed Dec 1 19:29:06 CET 2010

On 12/1/2010 12:55 PM, Alexander Belopolsky wrote:
> On Sun, Nov 28, 2010 at 5:48 PM, M.-A. Lemburg<mal at egenix.com>  wrote:
> ..
>>> With Python 3.1:
>>>
>>>>>> exec('\u0CF1 = 1')
>>> Traceback (most recent call last):
>>>   File "<stdin>", line 1, in<module>
>>>   File "<string>", line 1
>>>     ೱ = 1
>>>       ^
>>> SyntaxError: invalid character in identifier
>>>
>>> but with Python 3.2a4:
>>>
>>>>>> exec('\u0CF1 = 1')
>>>>>> eval('\u0CF1')
>>> 1
>>
>> Such changes are not new, but I agree that they should probably
>> be highlighted in the "What's new in Python x.x".
>>
>
> As of today, "What’s New In Python 3.2" [1] does not even mention the
> unicodedata upgrade to 6.0.0.  Here are the features form the
> unicode.org summary [2] that I think should be reflected in Python's
> "What's New" document:
>
> """
> * adds 2,088 characters, including over 1,000 additional symbols—chief
> among them the additional emoji symbols, which are especially
> important for mobile phones;
>
> * corrects character properties for existing characters including
>   - a general category change to two Kannada characters (U+0CF1,
> U+0CF2), which has the effect of making them newly eligible for
> inclusion in identifiers;
>
>   - a general category change to one New Tai Lue numeric character
> (U+19DA), which would have the effect of disqualifying it from
> inclusion in identifiers unless grandfathering measures are in place
> for the defining identifier syntax.
> """

> The above may be too verbose for inclusion to "What’s New In Python
> 3.2",

I think those 11 lines are pretty good. Put them in
('\N{CAT FACE WITH WRY SMILE}'!

Plus give a link to Unicode site (Issue numbers are implicit links).

-- 
Terry Jan Reedy