Correction: Re: string substitutions

John Machin sjmachin at lexicon.net
Sun Feb 24 23:12:18 EST 2002


Mike Dean <klaatu at evertek.net> wrote in message news:<mailman.1014583431.3100.python-list at python.org>...
> 
> And I had forgotten about the raw strings bit - one question about them
> though - if I'm using raw strings in my RE, is there a way to use a
> newline in an RE, or do I need to resort to regular strings for such a
> situation?
> 

REs were "designed" to be typed into text editors like the venerable
ed, not to be embedded in C or Python or whatever source code. Thus
the RE engine itself, as a user convenience, recognises a two-byte
escape sequence such as "\n" and transforms it into the single byte
0x0A (or maybe more precisely, the local representation of a newline).
This important fact is often not documented well or at all. Then you
have the extra layer of complexity when you embed an RE in program
source code -- the compiler will apply the same or similar
transformations. This leads to backslashorrhea unless the language
kindly gives you a no-escape-processing option like Python's r"...".

So, you don't have a problem at all getting a newline into a raw
string; it's a bit more difficult if you are trying to match a literal
backslash-n. Cut out the following and paste it inside your hat:

Python 2.2 (#28, Dec 21 2001, 12:21:22) [MSC 32 bit (Intel)] on win32
>>> import re
>>> txt = r"literal backslash then literal n <\n>; newline <" + "\n" +
">"
>>> txt
'literal backslash then literal n <\\n>; newline <\n>'
>>> # effect of different numbers of backslashes in raw pattern
...
>>> re.sub(r"n", "*", txt)
'literal backslash the* literal * <\\*>; *ewli*e <\n>'
>>> re.sub(r"\n", "*", txt)
'literal backslash then literal n <\\n>; newline <*>'
>>> re.sub(r"\\n", "*", txt)
'literal backslash then literal n <*>; newline <\n>'
>>> re.sub(r"\\\n", "*", txt)
'literal backslash then literal n <\\n>; newline <\n>'
>>> # effect of different numbers of backslashes in cooked pattern
...
>>> re.sub("n", "*", txt)
'literal backslash the* literal * <\\*>; *ewli*e <\n>'
>>> re.sub("\n", "*", txt)
'literal backslash then literal n <\\n>; newline <*>'
>>> re.sub("\\n", "*", txt)
'literal backslash then literal n <\\n>; newline <*>'
>>> re.sub("\\\n", "*", txt)
'literal backslash then literal n <\\n>; newline <*>'
>>> re.sub("\\\\n", "*", txt)
'literal backslash then literal n <*>; newline <\n>'
>>> re.sub("\\\\\n", "*", txt)
'literal backslash then literal n <\\n>; newline <\n>'
>>>

Hope you weren't expecting a one-line answer :-)



More information about the Python-list mailing list