problem with newlines in regexp substitution

James Stroud jstroud at ucla.edu
Thu Feb 23 16:10:36 EST 2006


Florian Schulze wrote:
> See the following results:
> 
> Python 2.3.5 (#62, Feb  8 2005, 16:23:02) [MSC v.1200 32 bit (Intel)] on 
> win32
> Type "help", "copyright", "credits" or "license" for more information.
> 
>>>> import re
>>>> s = "1"
>>>> re.sub('1','\\n',s)
> 
> '\n'
> 
>>>> '\\n'
> 
> '\\n'
> 
>>>> re.sub('1',r'\\n',s)
> 
> '\\n'
> 
>>>> s.replace('1','\\n')
> 
> '\\n'
> 
>>>> repl = '\\n'
>>>> re.sub('1',repl,s)
> 
> '\n'
> 
>>>> s.replace('1',repl)
> 
> '\\n'
> 
> Why is the behaviour of the regexp substitution so weird and can I 
> prevent that? It breaks my asumptions and thus my code.
> 
> Regards,
> Florian Schulze
> 

"Why" questions are always tough to answer. E.g.: Why are we here?

The answer to "what is happening" is much easier. Strings passed to the 
regex engine are processed first, so escapes must be escaped. This is 
why raw strings were invented. If it weren't for these, I'd still be 
using perl. In raw strings, as you have noticed, a '\' is already 
escaped. In the olden days, you'd have to type "\\\\" to mean a literal 
backslash, so creating a literal backslash in a regex that produced a 
string that would then itself be used in a regex would be 
'\\\\\\\\\\\\\\\\', which scared me away from Python for a couple of 
years (rmember, the final printed product would be '\').

That patently doesn't answer your question, but here is something to ponder:

py> s.replace('1',repl)[0]
'\\'
py> print s.replace('1',repl)
\n

James



More information about the Python-list mailing list