[issue2650] re.escape should not escape underscore

Morten Lied Johansen report at bugs.python.org
Thu Jun 26 16:45:12 CEST 2008


Morten Lied Johansen <mortenjo at ifi.uio.no> added the comment:

One issue that the current implementation has, which I can't see have 
been commented on here, is that it kills utf8 characters (and probably 
every other character encoding that is multi-byte).

A é character in an utf8 encoded string will be represented by two 
bytes. When passed through re.escape, those two bytes are checked 
individually, and both are considered non-alphanumeric, and is 
consequently escaped, breaking the utf8 string into complete gibberish 
instead.

----------
nosy: +mortenlj

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue2650>
_______________________________________


More information about the Python-bugs-list mailing list