[ python-Bugs-1500179 ] re.escape incorrectly escape literal.

SourceForge.net noreply at sourceforge.net
Mon Jun 5 00:17:30 CEST 2006


Bugs item #1500179, was opened at 2006-06-03 19:32
Message generated for change (Settings changed) made by gbrandl
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1500179&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Regular Expressions
Group: Python 2.4
Status: Closed
>Resolution: Invalid
Priority: 5
Submitted By: Baptiste Lepilleur (blep)
Assigned to: Gustavo Niemeyer (niemeyer)
Summary: re.escape incorrectly escape literal.

Initial Comment:
Using Python 2.4.2.

Here is a small programm excerpt that reproduce the
issue (attached):
---
import re
literal = r'E:\prg\vc'
print 'Expected:', literal
print 'Actual:', re.sub('a', re.escape(literal), 'a' )
assert re.sub('a', re.escape(literal), 'a' ) == literal
---
And the output of the sample:
---
Expected: E:\prg\vc
Actual  : E\:\prg\vc
Traceback (most recent call last):
  File "re_escape_bug.py", line 5, in ?
    assert re.sub('a', re.escape(literal), 'a' ) == literal
AssertionError
---

Looking at regular expression syntax of python
documentation I don't see why ':' is escaped as '\:'.

Baptiste.

----------------------------------------------------------------------

Comment By: Baptiste Lepilleur (blep)
Date: 2006-06-03 21:45

Message:
Logged In: YES 
user_id=196852

You are correct. Though, the 'repl' string parameter is not
a literal string and is interpreted. The correct escape
function to preserve the literal is
literal.replace('\\','\\\\') not re.escape(). It would
preserve any interpretation of the repl pattern. I believe
this fact should be clearly stated in the documentation as
it is not that obvious.

The following assertion pass:
---
import re
literal = r'e:\prg\vc\1'
assert re.sub( '(a+)', 
               literal.replace('\\','\\\\'), 
               'aabac' ) == (literal+'b'+literal+'c')
---

In the above example neither \v nor \1 are interpreted.

Regards,
Baptiste.


----------------------------------------------------------------------

Comment By: A.M. Kuchling (akuchling)
Date: 2006-06-03 20:27

Message:
Logged In: YES 
user_id=11375

The assertion is wrong, I think.   The signature is re.sub(pattern, replacement, 
string), so the assertion is replacing 'a' with re.escape(literal), which is 
obviously not going to equal literal.

re.escape() puts a backslash in front of all non-alphanumeric characters; ':' is 
non-alphanumeric, so it will be escaped.  The regex parser will ignore 
unknown escapes, so \: is the same as : -- the redundant escaping is 
harmless.


----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1500179&group_id=5470


More information about the Python-bugs-list mailing list