[Python-Dev] Unicode 5.1.0

Guido van Rossum guido at python.org
Thu Aug 21 22:35:20 CEST 2008


I was just paid a visit by my Google colleague Mark Davis, co-founder
of the Unicode project and the president of the Unicode Consortium. He
would like to see improved Unicode support for Python. (Well duh. :-)
On his list of top priorities are:

1. Upgrade the unicodata module to the Unicode 5.1.0 standard
2. Extende the unicodedata module with some additional properties
3. Add support for Unicode properties to the regex syntax, including
Boolean combinations

I've tried to explain our release schedule and
no-new-features-in-point-releases policies to him, and he understands
that it's too late to add #2 or #3 to 2.6 and 3.0, and that these will
have to wait for 2.7 and 3.1, respectively. However, I've kept the
door sligthtly ajar for adding #1 -- it can't be too much work and it
can't have too much impact. Or can it? I don't actually know what the
impact would be, so I'd like some impact from developers who are
closer to the origins of the unicodedata module.

The two, quite separate, questions, then, are (a) how much work would
it be to upgrade to version 5.1.0 of the database; and (b) would it be
acceptable to do this post-beta3 (but before rc1). If the answer to
(b) is positive, Google can help with (a).

In general, Google has needs in this area that can't wait for 2.7/3.1,
so what we may end up doing is create internal implementations of all
three features (compatible with Python 2.4 and later), publish them as
open source on Google Code, and fold them into core Python at the
first opportunity, which would likely be 2.7 and 3.1.

Comments?

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)


More information about the Python-Dev mailing list