non-terminating regex match
MRAB
google at mrabarnett.plus.com
Wed Apr 2 15:11:12 EDT 2008
On Apr 2, 5:01 pm, Maurizio Vitale <m... at cuma.polymath-solutions.lan>
wrote:
> Has to be something really stupid, but the following never finish
> (running Python 2.5.1 (r251:54863, Jan 10 2008, 18:00:49)
> [GCC 4.2.1 (SUSE Linux)] on linux2).
>
> The intention is to match C++ identifiers, with or without namespace
> qualification, with or without arguments (e.g. variables, functions and
> macros).
> The following should be accepted:
> main
> main(int,char**)
> ::main
> std::cout
> ::std::cout
> NDEBUG
>
> Thanks for any help.
> And yes, I'm a total beginner when it comes to Python, but it seems
> very strange to me that a regex match on a finite length string
> doesn't terminate
> Regards,
>
> Maurizio
>
> #!/usr/bin/env python
> # -*- Python -*-
>
> import re
>
> if __name__ == '__main__':
> r = re.compile (
> r'(?:(?P<scope>(?:(?:::)?\w+)*)::)?'
> r'(?P<name>\w+)'
> r'(?:\((?P<arguments>[^\)]*)\))?'
> )
> match = r.search ('WITH_ALOHA_EXCEPTION_HANDLERS')
I think the problem is with this bit: '(?:(?:::)?\w+)*'. The '::' is
optional and also in a repeated group, so if it tries to match, say,
'abc' it can try and then backtrack all of these possibilities: abc,
ab c, a bc, a b c. The longer the string, the more possibilities to
try. Try this instead:
r = re.compile (
r'(?P<scope>(?:::)?(?:\w+::)*)?'
r'(?P<name>\w+)'
r'(?:\((?P<arguments>[^\)]*)\))?'
)
More information about the Python-list
mailing list