[Python-Dev] Python and the Unicode Character Database

Wed Dec 1 18:55:28 CET 2010

On Sun, Nov 28, 2010 at 5:48 PM, M.-A. Lemburg <mal at egenix.com> wrote:
..
>> With Python 3.1:
>>
>>>>> exec('\u0CF1 = 1')
>> Traceback (most recent call last):
>>  File "<stdin>", line 1, in <module>
>>  File "<string>", line 1
>>    ೱ = 1
>>      ^
>> SyntaxError: invalid character in identifier
>>
>> but with Python 3.2a4:
>>
>>>>> exec('\u0CF1 = 1')
>>>>> eval('\u0CF1')
>> 1
>
> Such changes are not new, but I agree that they should probably
> be highlighted in the "What's new in Python x.x".
>

As of today, "What’s New In Python 3.2" [1] does not even mention the
unicodedata upgrade to 6.0.0.  Here are the features form the
unicode.org summary [2] that I think should be reflected in Python's
"What's New" document:

"""
* adds 2,088 characters, including over 1,000 additional symbols—chief
among them the additional emoji symbols, which are especially
important for mobile phones;

* corrects character properties for existing characters including
 - a general category change to two Kannada characters (U+0CF1,
U+0CF2), which has the effect of making them newly eligible for
inclusion in identifiers;

 - a general category change to one New Tai Lue numeric character
(U+19DA), which would have the effect of disqualifying it from
inclusion in identifiers unless grandfathering measures are in place
for the defining identifier syntax.
"""

The above may be too verbose for inclusion to "What’s New In Python
3.2", but I think we should add a possibly shorter summary with a link
to unicode.org for details.

PS: Yes, I think everyone should know about the Python 3.2 killer
feature: ('\N{CAT FACE WITH WRY SMILE}'!

[1] http://docs.python.org/dev/whatsnew/3.2.html
[2] http://www.unicode.org/versions/Unicode6.0.0/