re.sub
Tim Chase
python.list at tim.thechases.com
Tue Oct 16 14:38:28 EDT 2007
> Let me show you a very bad consequence of this...
>
> a=open('file1.txt','rb').read()
> b=re.sub('x',a,'x')
> open('file2.txt','wb').write(b)
>
> Now if file1.txt contains a \n or \" then file2.txt is not the
> same as file1.txt while it should be.
That's functioning as designed. If you want to treat file1.txt
as a literal pattern for replacement, use re.escape() on it to
escape things you don't want.
http://docs.python.org/lib/node46.html#l2h-407
Or, you can specially treat newlines:
b=re.sub('x', a.replace('\n', '\\n'), 'x')
or just escape the backslashes on the incoming pattern:
b=re.sub('x', a.replace('\\', '\\\\'), 'x')
In the help for the RE module's syntax, this is explicitly noted:
http://docs.python.org/lib/re-syntax.html
"""
If you're not using a raw string to express the pattern, remember
that Python also uses the backslash as an escape sequence in
string literals; if the escape sequence isn't recognized by
Python's parser, the backslash and subsequent character are
included in the resulting string. However, if Python would
recognize the resulting sequence, the backslash should be
repeated twice. This is complicated and hard to understand, so
it's highly recommended that you use raw strings for all but the
simplest expressions.
"""
The short upshot: "it's highly recommended that you use raw
strings for all but the simplest expressions."
Thus, the string that you pass as your regexp should be a regexp.
Not a "python interpretation a regexp before the regex engine
gets to touch it".
-tkc
More information about the Python-list
mailing list