Python RegExp
Fredrik Lundh
fredrik at pythonware.com
Tue Mar 22 18:41:33 EST 2005
davidchipping at gmail.com wrote:
>I have a string which I wish to match using RE, however when I run my
> comparison (using the match method) on my machine it never returns,
> using the CPU fully.
>
> The code is (very simply):
>
> ------------------------------
> import re
>
> buffer = r"#1 1 550 111 SYNC_PEER RES
> <YP>syncpeers=(id=54325432;add=10." \
>
> "0.0.1;port=89;up=89),(id=97899;add=10.0.0.1;port=543;up=543)," \
> "(id=54325432;add=10.0.0.1;port=89;up=8"
>
> p = re.compile(r'#(?P<tran>[0-9]+\s)(?P<sess>[0-9]+\s)' \
> + '(?P<ndto>[0-9]+\s)(?P<ndfr>[0-9]+\s)' \
> + '(?P<cmd>[a-zA-Z_]+\s)(?P<dir>[a-zA-Z_]+)' \
> + '(?P<parm>(\s<[a-zA-Z]+>(\s*[a-zA-Z]+=[a-zA' \
> + '-Z0-9.,=();]+\s*)+<[a-zA-Z]+>)*)#(?P<left>.*$)')
>
> print "starting the match"
> rst = (p.match (buffer) != None)
> print "finishing the match"
> --------------------------------
>
> Should this every actually happen? Regardless of whether the statment
> matches or not, surely it should do this?
surely the following should never be slow?
n = len(s)
for a in range(n):
for b in range(n):
for c in range(n):
for d in range(n):
for e in range(n):
... etc ...
(your RE contains several nested and/or related repeat operators,
which forces the poor engine to test zillions of combinations before
giving up. see Friedl's RE book for more information on this)
I suggest using a real parser for this task (or using the RE engine as
a tokenizer, and doing the rest in code; see
http://effbot.org/zone/xml-scanner.htm for more on this)
</F>
More information about the Python-list
mailing list