Freeze problem with Regular Expression

Peter Pearson ppearson at nowhere.invalid
Thu Jun 26 12:20:01 EDT 2008


On 25 Jun 2008 15:20:04 GMT, Kirk <noreply at yahoo.com> wrote:
> Hi All,
> the following regular expression matching seems to enter in a infinite 
> loop:
>
> ################
> import re
> text = ' MSX INTERNATIONAL HOLDINGS ITALIA srl (di seguito MSX ITALIA) 
> una '
> re.findall('[^A-Z|0-9]*((?:[0-9]*[A-Z]+[0-9|a-z|\-]*)+\s*[a-z]*\s*(?:[0-9]
> *[A-Z]+[0-9|a-z|\-]*\s*)*)([^A-Z]*)$', text)
> #################
>
> No problem with perl with the same expression:
>
> #################
> $s = ' MSX INTERNATIONAL HOLDINGS ITALIA srl (di seguito MSX ITALIA) una 
> ';
> $s =~ /[^A-Z|0-9]*((?:[0-9]*[A-Z]+[0-9|a-z|\-]*)+\s*[a-z]*\s*(?:[0-9]*[A-
> Z]+[0-9|a-z|\-]*\s*)*)([^A-Z]*)$/;
> print $1;
> #################
>
> I've python 2.5.2 on Ubuntu 8.04.
> any idea?

If it will help some smarter person identify the problem, it can
be simplified to this:

re.findall('[^X]*((?:0*X+0*)+\s*a*\s*(?:0*X+0*\s*)*)([^X]*)$',
           "XXXXXXXXXXXXXXXXX (X" )

This doesn't actually hang, it just takes a long time.  The
time taken increases quickly as the chain of X's gets longer.

HTH

-- 
To email me, substitute nowhere->spamcop, invalid->net.



More information about the Python-list mailing list