A bug in Python's regular expression engine?

Paul Hankin paul.hankin at gmail.com
Tue Nov 27 11:07:17 EST 2007


On Nov 27, 3:48 pm, "Just Another Victim of the Ambient Morality"
<ihates... at hotmail.com> wrote:
>     This won't compile for me:
>
> regex = re.compile('(.*\\).*')
>
>     I get the error:
>
> sre_constants.error: unbalanced parenthesis
>
>     I'm running Python 2.5 on WinXP.  I've tried this expression with
> another RE engine in another language and it works just fine which leads me
> to believe the problem is Python.  Can anyone confirm or deny this bug?

Your code is equivalent to:
regex = re.compile(r'(.*\).*')

Written like this, it's easier to see that you've started a regular
expression group with '(', but it's never closed since your closed
parenthesis is escaped (which causes it to match a literal ')' when
used). Hence the reported error (which isn't a bug).

Perhaps you meant this?
regex = re.compile(r'(.*\\).*')

This matches any number of characters followed by a backslash (group
1), and then any number of characters. If you're using this for path
splitting filenames under Windows, you should look at os.path.split
instead of writing your own.

HTH
--
Paul Hankin



More information about the Python-list mailing list