[docs] [issue12731] python lib re uses obsolete sense of \w in full violation of UTS#18 RL1.2a

Guido van Rossum report at bugs.python.org
Sun Aug 28 19:22:45 CEST 2011


Guido van Rossum <guido at python.org> added the comment:

[me]
>> But I really hope the re module (really: the _sre extension module)
>> can be fixed.

[Ezio]
> Start fixing these issues from scratch doesn't make much sense IMHO.  We could "extract" the fixes from regex and merge them in re, but then again it's probably easier to just replace the whole module.

I have changed my mind at least half-way. I am open to having regex
(with some changes, details TBD) replace re in 3.3. (I am not yet 100%
convinced, but I'm not rejecting it as strongly as I was when I wrote
that comment in this bug. See the ongoing python-dev discussion on
this topic.)

>> We should also make a habit in our docs of citing specific versions
>> of the Unicode standard, and specific TR numbers and versions where
>> they apply.
>
> While this is a good thing it's not always doable.  Usually someone reports a bug related to something specified in some standard and only that part gets fixed.  Sometimes everything else is also updated to follow the whole standard, but often this happens incrementally, so we can't say, e.g., "the re module supports Unicode x.y" unless we go through the whole standard and fix/implements everything.

Hm. I think that for Unicode it may actually be important enough to be
consistent in following the whole standard that we should attempt to
be consistent and not just chase bug reports. Now, we may consciously
decide not to implement a certain recommendation of the standard. E.g.
I'm not going to require that IronPython or Jython have string objects
that support O(1) indexing of code points, even (assuming PEP 393 gets
accepted) CPython will have them. But these decisions should be made
explicitly, and documented clearly.

Ideally, we need a "Unicode czar" -- a core developer whose job it is
to keep track of Python's compliance with various parts and versions
of the Unicode standard and who can nudge other developers towards
fixing bugs or implementing features, or update the documentation in
case things don't get added. (I like Tom's approach to Java 1.7, where
he submitted proposed doc fixes explaining the deviations from the
standard. Perhaps a bit passive-aggressive, but it was effective. :-)

----------

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue12731>
_______________________________________


More information about the docs mailing list