Pathological regular expression

David Liang bmdavll at gmail.com
Thu Apr 9 06:17:59 EDT 2009


On Apr 9, 2:56 am, David Liang <bmda... at gmail.com> wrote:
> Hi all,
> I'm having a weird problem with a regular expression (tested in 2.6
> and 3.0):
>
> Basically, any of these:
> _re_comments = re.compile(r'^(([^\\]+|\\.|"([^"\\]+|\\.)*")*)#.*$')
> _re_comments = re.compile(r'^(([^#]+|\\.|"([^"\\]+|\\.)*")*)#.*$')
> _re_comments = re.compile(r'^(([^"]+|\\.|"([^"\\]+|\\.)*")*)#.*$')
>
> followed by for example,
> line = r'~/.[m]ozilla/firefox/*.default/chrome'
> print(_re_comments.sub(r'\1', line))
>
> ...hangs the interpreter. For reference, if the first command had been
> _re_comments = re.compile(r'^(([^z]+|\\.|"([^"\\]+|\\.)*")*)#.*$')
>
> (off by one character z) it works fine, and all the equivalent
> operations work in sed and awk. Am I missing something about Python
> RE's?
>
> -David

The problem was the redundant +'s; the fixed RE is

    _re_comments = re.compile(r'^(([^#"\\]|\\.|"([^"\\]|\\.)*")*)#.*')



More information about the Python-list mailing list