[docs] [issue18779] Misleading documentations and comments in regular expression HOWTO

Mon Aug 19 12:19:36 CEST 2013

Vajrasky Kok added the comment:

In Lib/re.py, starting from line 77 (Python 3.4):

    \w       Matches any alphanumeric character; equivalent to [a-zA-Z0-9_]
             in bytes patterns or string patterns with the ASCII flag.
             In string patterns without the ASCII flag, it will match the
             range of Unicode alphanumeric characters (letters plus digits
             plus underscore).
             With LOCALE, it will match the set [0-9_] plus characters defined
             as letters for the current locale.

The prelude is "Matches any alphanumeric character;".

Yet, in any case (bytes, string patterns with ascii flag, string patterns without the ascii flag, strings with locale), the underscore is always included.

Then why don't we change the prelude to "Matches any alphanumeric character and underscore character;"? In the description we explain the alphanumeric depending on it's unicode or not can be [A-Za-z0-9] or wider than that.

The description is already okay but the prelude is misleading readers.

----------

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue18779>
_______________________________________