[docs] [issue12731] python lib re uses obsolete sense of \w in full violation of UTS#18 RL1.2a

Antoine Pitrou report at bugs.python.org
Sat Aug 13 19:36:37 CEST 2011


Antoine Pitrou <pitrou at free.fr> added the comment:

> However, because the \w&c issues are bigger, Java addressed the tr18 RL1.2a
> issues differently, this time by creating a new compilation flag called
> UNICODE_CHARACTER_CLASSES (with corresponding embedded "(?U)" regex flag.)
> 
> Truth be told, even Perl has secret pattern compilation flags to govern
> this sort of thing (ascii, locale, unicode), but we (well, I) hope you
> never have to use or even notice them.  
> 
> That too might be a route forward for Python, although I am not quite sure
> how much flexibility and control of your lexical scope you have.  However,
> the "from __future_" imports suggest you may have enough to do something
> slick so that only people who ask for it get it, and also importantly that
> they get it all over the place so don't have to add an extra flag or u'...'
> or whatever every single time.  

If the current behaviour is buggy or sub-optimal, I think we should
simply fix it (which might be done by replacing "re" with "regex" if
someone wants to shepherd its inclusion in the stdlib).

By the way, thanks for the detailed explanations, Tom.

----------

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue12731>
_______________________________________


More information about the docs mailing list