[ python-Bugs-1500179 ] re.escape incorrectly escape literal.
SourceForge.net
noreply at sourceforge.net
Mon Jun 5 00:17:30 CEST 2006
Bugs item #1500179, was opened at 2006-06-03 19:32
Message generated for change (Settings changed) made by gbrandl
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1500179&group_id=5470
Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Regular Expressions
Group: Python 2.4
Status: Closed
>Resolution: Invalid
Priority: 5
Submitted By: Baptiste Lepilleur (blep)
Assigned to: Gustavo Niemeyer (niemeyer)
Summary: re.escape incorrectly escape literal.
Initial Comment:
Using Python 2.4.2.
Here is a small programm excerpt that reproduce the
issue (attached):
---
import re
literal = r'E:\prg\vc'
print 'Expected:', literal
print 'Actual:', re.sub('a', re.escape(literal), 'a' )
assert re.sub('a', re.escape(literal), 'a' ) == literal
---
And the output of the sample:
---
Expected: E:\prg\vc
Actual : E\:\prg\vc
Traceback (most recent call last):
File "re_escape_bug.py", line 5, in ?
assert re.sub('a', re.escape(literal), 'a' ) == literal
AssertionError
---
Looking at regular expression syntax of python
documentation I don't see why ':' is escaped as '\:'.
Baptiste.
----------------------------------------------------------------------
Comment By: Baptiste Lepilleur (blep)
Date: 2006-06-03 21:45
Message:
Logged In: YES
user_id=196852
You are correct. Though, the 'repl' string parameter is not
a literal string and is interpreted. The correct escape
function to preserve the literal is
literal.replace('\\','\\\\') not re.escape(). It would
preserve any interpretation of the repl pattern. I believe
this fact should be clearly stated in the documentation as
it is not that obvious.
The following assertion pass:
---
import re
literal = r'e:\prg\vc\1'
assert re.sub( '(a+)',
literal.replace('\\','\\\\'),
'aabac' ) == (literal+'b'+literal+'c')
---
In the above example neither \v nor \1 are interpreted.
Regards,
Baptiste.
----------------------------------------------------------------------
Comment By: A.M. Kuchling (akuchling)
Date: 2006-06-03 20:27
Message:
Logged In: YES
user_id=11375
The assertion is wrong, I think. The signature is re.sub(pattern, replacement,
string), so the assertion is replacing 'a' with re.escape(literal), which is
obviously not going to equal literal.
re.escape() puts a backslash in front of all non-alphanumeric characters; ':' is
non-alphanumeric, so it will be escaped. The regex parser will ignore
unknown escapes, so \: is the same as : -- the redundant escaping is
harmless.
----------------------------------------------------------------------
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1500179&group_id=5470
More information about the Python-bugs-list
mailing list