Unicode 4.0 updates to unicodedata?

Fri Sep 19 00:52:00 EDT 2003

David Opstad <opstad at batnet.com> writes:

> Anyway, on to the question. Now that Unicode 4.0 has been released (just 
> got my copy today), any guesses on how long before the unicodedata 
> module will be updated to include all the new names?

It might happen for Python 2.4, but by the time Python 2.4 is
released, the Unicode 4.0 database might get skipped, and Python might
incorporate Unicode 4.2 (or some such) instead.

The tricky part is that IDNA specifies Unicode 3.2 as the basis of
international domain names, so some technology must be found to
incorporate two versions of the database in Python, without adding too
much overhead.

> How do things like that work, anyway; is there somebody whose task
> it is to update that, or are they awaiting volunteers to help out?

In general, it would be somebody's task (i.e. mine) to incorporate a
new version. However, since this is more than running the generator
again (as actual code changes have to go with it), contributions are
welcome.

> And once the module is updated, is it generally usable on earlier
> Python releases (I'm running the 2.2 that came with the OS X
> developer package for Jaguar)?

If you want to backport that database yourself, you could just as well
create your own version of the Unicode 4.0 database. Just run the
generator, and rename the unicodedata module to unicodedata40 (inside
the module's source code). Python won't then use this database
internally (for .is*, and .upper, ...), but you could readily invoke
the unicodedata40 functions yourself.

Regards,
Martin