re.sub unexpected behaviour
Steven D'Aprano
steve at REMOVE-THIS-cybersource.com.au
Tue Jul 6 13:32:07 EDT 2010
On Tue, 06 Jul 2010 19:10:17 +0200, Javier Collado wrote:
> Hello,
>
> Let's imagine that we have a simple function that generates a
> replacement for a regular expression:
>
> def process(match):
> return match.string
>
> If we use that simple function with re.sub using a simple pattern and a
> string we get the expected output:
> re.sub('123', process, '123')
> '123'
>
> However, if the string passed to re.sub contains a trailing new line
> character, then we get an extra new line character unexpectedly:
> re.sub(r'123', process, '123\n')
> '123\n\n'
I don't know why you say it is unexpected. The regex "123" matched the
first three characters of "123\n". Those three characters are replaced by
a copy of the string you are searching "123\n", which gives "123\n\n"
exactly as expected.
Perhaps these examples might help:
>>> re.sub('W', process, 'Hello World')
'Hello Hello Worldorld'
>>> re.sub('o', process, 'Hello World')
'HellHello World WHello Worldrld'
Here's a simplified pure-Python equivalent of what you are doing:
def replace_with_match_string(target, s):
n = s.find(target)
if n != -1:
s = s[:n] + s + s[n+len(target):]
return s
> If we try to get the same result using a replacement string, instead of
> a function, the strange behaviour cannot be reproduced: re.sub(r'123',
> '123', '123')
> '123'
>
> re.sub('123', '123', '123\n')
> '123\n'
The regex "123" matches the first three characters of "123\n", which is
then replaced by "123", giving "123\n", exactly as expected.
>>> re.sub("o", "123", "Hello World")
'Hell123 W123rld'
--
Steven
More information about the Python-list
mailing list