Freeze problem with Regular Expression
Peter Pearson
ppearson at nowhere.invalid
Thu Jun 26 12:20:01 EDT 2008
On 25 Jun 2008 15:20:04 GMT, Kirk <noreply at yahoo.com> wrote:
> Hi All,
> the following regular expression matching seems to enter in a infinite
> loop:
>
> ################
> import re
> text = ' MSX INTERNATIONAL HOLDINGS ITALIA srl (di seguito MSX ITALIA)
> una '
> re.findall('[^A-Z|0-9]*((?:[0-9]*[A-Z]+[0-9|a-z|\-]*)+\s*[a-z]*\s*(?:[0-9]
> *[A-Z]+[0-9|a-z|\-]*\s*)*)([^A-Z]*)$', text)
> #################
>
> No problem with perl with the same expression:
>
> #################
> $s = ' MSX INTERNATIONAL HOLDINGS ITALIA srl (di seguito MSX ITALIA) una
> ';
> $s =~ /[^A-Z|0-9]*((?:[0-9]*[A-Z]+[0-9|a-z|\-]*)+\s*[a-z]*\s*(?:[0-9]*[A-
> Z]+[0-9|a-z|\-]*\s*)*)([^A-Z]*)$/;
> print $1;
> #################
>
> I've python 2.5.2 on Ubuntu 8.04.
> any idea?
If it will help some smarter person identify the problem, it can
be simplified to this:
re.findall('[^X]*((?:0*X+0*)+\s*a*\s*(?:0*X+0*\s*)*)([^X]*)$',
"XXXXXXXXXXXXXXXXX (X" )
This doesn't actually hang, it just takes a long time. The
time taken increases quickly as the chain of X's gets longer.
HTH
--
To email me, substitute nowhere->spamcop, invalid->net.
More information about the Python-list
mailing list