escapes in regular expressions

Paul McGuire ptmcg at austin.rr._bogus_.com
Sun May 21 14:52:13 EDT 2006


"James Thiele" <jamesthiele.usenet at gmail.com> wrote in message
news:1148233776.133200.59280 at u72g2000cwu.googlegroups.com...
> I was helping a guy at work with regular expressions and found
> something I didn't expect:
>
> >>> re.match('\d', '7').group()
> '7'
> >>> re.match('\\d', '7').group()
> '7'
> >>>
>
> It's not clear to me why these are the same. Could someone please
> explain?
>

This is not a feature of regexp's at all, but of Python strings.  If the
backslash precedes a character that is not normally interpreted, then it is
treated like just a backslash.  Look at this sample from the Python command
line:

>>> s = "\d"
>>> s
'\\d'
>>> s = "\t"
>>> s
'\t'
>>>

This is one reason why Python programmers who use regexp's use the "raw"
notation to create strings (this is often misnomered as a "raw string", but
the resulting string is an ordinary string in every respect - what is "raw"
about it is the disabling of escape behavior of any backslashes that are not
the last character in the string).  It is painful enough to litter your
regexp with backslashes, just because you have the misfortune of having to
match a '.', '+', '?', '*', or brackets or parentheses in your expression,
without having to double up the backslashes for escaping purposes.  Consider
these sample statements:

>>> "\d" == "\\d"
True
>>> "\t" == "\\t"
False
>>> r"\t" == "\\t"
True
>>>

So your question is really a string question - you just happened to trip
over it while defining a regexp.

-- Paul





More information about the Python-list mailing list