[issue12731] python lib re uses obsolete sense of \w in full violation of UTS#18 RL1.2a

Ezio Melotti report at bugs.python.org
Sun Aug 28 19:58:12 CEST 2011


Ezio Melotti <ezio.melotti at gmail.com> added the comment:

> Ideally, we need a "Unicode czar" -- a core developer whose job it is
> to keep track of Python's compliance with various parts and versions
> of the Unicode standard and who can nudge other developers towards
> fixing bugs or implementing features, or update the documentation in
> case things don't get added.

We should first do a full review of the latest Unicode standard and see what's missing.  I think there might be parts of older Unicode versions (even < Unicode 5) that are not yet implemented.  Chapter 3 is a good place where to start, but I'm not sure that's enough -- there are a few TRs that should be considered as well.
If we manage to catch up with Unicode 6, then it shouldn't be too difficult to review the changes that every new version will introduce and open an issue for each (or a single issue if the changes are limited).
FWIW I'm planning to look at the conformance of the UTF codecs and fix them (if necessary) whenever I'll have time.

----------

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue12731>
_______________________________________


More information about the Python-bugs-list mailing list