non-terminating regex match

MRAB google at mrabarnett.plus.com
Wed Apr 2 15:11:12 EDT 2008


On Apr 2, 5:01 pm, Maurizio Vitale <m... at cuma.polymath-solutions.lan>
wrote:
> Has to be something really stupid, but the following never finish
> (running Python 2.5.1 (r251:54863, Jan 10 2008, 18:00:49)
> [GCC 4.2.1 (SUSE Linux)] on linux2).
>
> The intention is to match C++ identifiers, with or without namespace
> qualification, with or without arguments (e.g. variables, functions and
> macros).
> The following should be accepted:
>     main
>     main(int,char**)
>     ::main
>     std::cout
>     ::std::cout
>     NDEBUG
>
> Thanks for any help.
> And yes, I'm a total beginner when it comes to Python, but it seems
> very strange to me that a regex match on a finite length string
> doesn't terminate
> Regards,
>
>         Maurizio
>
> #!/usr/bin/env python
> # -*- Python -*-
>
> import re
>
> if __name__ == '__main__':
>         r = re.compile (
>             r'(?:(?P<scope>(?:(?:::)?\w+)*)::)?'
>             r'(?P<name>\w+)'
>             r'(?:\((?P<arguments>[^\)]*)\))?'
>             )
>         match = r.search ('WITH_ALOHA_EXCEPTION_HANDLERS')

I think the problem is with this bit: '(?:(?:::)?\w+)*'. The '::' is
optional and also in a repeated group, so if it tries to match, say,
'abc' it can try and then backtrack all of these possibilities: abc,
ab c, a bc, a b c. The longer the string, the more possibilities to
try. Try this instead:

    r = re.compile (
        r'(?P<scope>(?:::)?(?:\w+::)*)?'
        r'(?P<name>\w+)'
        r'(?:\((?P<arguments>[^\)]*)\))?'
        )



More information about the Python-list mailing list