Freeze problem with Regular Expression

John Machin sjmachin at lexicon.net
Wed Jun 25 18:29:38 EDT 2008


On Jun 26, 1:20 am, Kirk <nore... at yahoo.com> wrote:
> Hi All,
> the following regular expression matching seems to enter in a infinite
> loop:
>
> ################
> import re
> text = ' MSX INTERNATIONAL HOLDINGS ITALIA srl (di seguito MSX ITALIA)
> una '
> re.findall('[^A-Z|0-9]*((?:[0-9]*[A-Z]+[0-9|a-z|\-]*)+\s*[a-z]*\s*(?:[0-9]
> *[A-Z]+[0-9|a-z|\-]*\s*)*)([^A-Z]*)$', text)
> #################
>
[expletives deleted]
>
> I've python 2.5.2 on Ubuntu 8.04.
> any idea?

Several problems:
(1) lose the vertical bars (as advised by others)
(2) ALWAYS use a raw string for regexes; your \s* will match on lower-
case 's', not on spaces
(3) why are you using findall on a pattern that ends in "$"?
(4) using non-verbose regexes of that length means you haven't got a
petrol drum's hope in hell of understanding what's going on
(5) too many variable-length patterns, will take a finite (but very
long) time to evaluate
(6) as remarked by others, you haven't said what you are trying to do;
what it actually is doing doesn't look sensible (see below).

Following code is after fixing problems 1,2,3,4:

C:\junk>type infinitere.py
import re
text = ' MSX INTERNATIONAL HOLDINGS ITALIA srl (di seguito MSX ITALIA)
una '
regex0 = r"""
[^A-Z0-9]*              # match leading space
(
    (?:
        [0-9]*          # match nothing
        [A-Z]+          # match "MSX"
        [0-9a-z\-]*     # match nothing
    )+                  # match "MSX"
    \s*                 # match " "
    [a-z]*              # match nothing
    \s*                 # match nothing
    (?:
        [0-9]*
        [A-Z]+
        [0-9a-z\-]*
        \s*
    )*                  # match "INTERNATIONAL HOLDINGS ITALIA "
)
([^A-Z]*)               # match "srl (di sequito "
"""
regex1 = regex0 + "$"
for rxno, rx in enumerate([regex0, regex1]):
    mobj = re.compile(rx, re.VERBOSE).match(text)
    if mobj:
        print rxno, mobj.groups()
    else:
        print rxno, "failed"

C:\junk>infinitere.py
0 ('MSX INTERNATIONAL HOLDINGS ITALIA ', 'srl (di seguito ')
### taking a long time, interrupted

HTH,
John



More information about the Python-list mailing list