A bug in Python's regular expression engine?

Just Another Victim of the Ambient Morality ihatespam at hotmail.com
Tue Nov 27 11:19:37 EST 2007


"Paul Hankin" <paul.hankin at gmail.com> wrote in message 
news:31047857-42ca-415e-83be-d1d360341ab0 at j20g2000hsi.googlegroups.com...
> On Nov 27, 3:48 pm, "Just Another Victim of the Ambient Morality"
> <ihates... at hotmail.com> wrote:
>>     This won't compile for me:
>>
>> regex = re.compile('(.*\\).*')
>>
>>     I get the error:
>>
>> sre_constants.error: unbalanced parenthesis
>>
>>     I'm running Python 2.5 on WinXP.  I've tried this expression with
>> another RE engine in another language and it works just fine which leads 
>> me
>> to believe the problem is Python.  Can anyone confirm or deny this bug?
>
> Your code is equivalent to:
> regex = re.compile(r'(.*\).*')
>
> Written like this, it's easier to see that you've started a regular
> expression group with '(', but it's never closed since your closed
> parenthesis is escaped (which causes it to match a literal ')' when
> used). Hence the reported error (which isn't a bug).
>
> Perhaps you meant this?
> regex = re.compile(r'(.*\\).*')
>
> This matches any number of characters followed by a backslash (group
> 1), and then any number of characters. If you're using this for path
> splitting filenames under Windows, you should look at os.path.split
> instead of writing your own.

    Indeed, I did end up using os.path functions, instead.
    I think I see what's going on.  Backslash has special meaning in both 
the regular expression and Python string declarations.  So, my version 
should have been something like this:


regex = re.compile('(.*\\\\).*')


    That is funny.  Thank you for your help...
    Just for clarification, what does the "r" in your code do?






More information about the Python-list mailing list