'\\' in regex affects the following parenthesis?

Paul McGuire ptmcg at austin.rr.com
Sat Apr 21 20:57:10 EDT 2007


On Apr 21, 6:56 pm, vox... at gmail.com wrote:
> Could someone tell me why:
>
> >>> import re
> >>> p = re.compile('\\.*\\(.*)')
>
> Fails with message:
>
> Traceback (most recent call last):
>   File "<pyshell#12>", line 1, in <module>
>     re.compile('\\dir\\(file)')
>   File "C:\Python25\lib\re.py", line 180, in compile
>     return _compile(pattern, flags)
>   File "C:\Python25\lib\re.py", line 233, in _compile
>     raise error, v # invalid expression
> error: unbalanced parenthesis
>
> I thought '\\' should just be interpreted as a single '\' and not
> affect anything afterwards...
>
> The script 'redemo.py' shipped with Python by default is just fine
> about this regex however.

You are getting overlap between the Python string literal \\ escaping
and re's \\ escaping.  In a Python string literal '\\' gets collapsed
down to '\', so to get your desired result, you would need to double-
double every '\', as in:

p = re.compile('\\\\.*\\\\(.*)')

Ugly, no?  Fortunately, Python has a special form for string literals,
called "raw" which suppresses Python's processing of \'s for escaping
- I think this was done expressly to help simplify entering re
strings.  To use raw format for a string literal, just precede the
opening quotation mark with an r.  Here is your original string, using
a raw literal:

p = re.compile(r'\\.*\\(.*)')

This will compile ok.

(Sometimes these literals are referred to as "raw strings" - I think
this is confusing because new users think this is a special type of
string type, different from str.  This creates the EXACT SAME type of
str; the r just tells the compiler/interpreter to handle the quoted
literal a little differently.  So I prefer to call them "raw
literals".)

-- Paul




More information about the Python-list mailing list