Why do look-ahead and look-behind have to be fixed-width patterns?

John Machin sjmachin at lexicon.net
Thu Jan 27 22:27:13 EST 2005


inhahe wrote:
> Hi i'm a newbie at this and probably always will be, so don't be
surprised
> if I don't know what i'm talking about.
>
> but I don't understand why regex look-behinds (and look-aheads) have
to be
> fixed-width patterns.
>
> i'm getting the impression that it's supposed to make searching
> exponentially slower otherwise
>
> but i just don't see how.
>
> say i have the expression (?<=.*?:.*?:).*
> all the engine has to do is search for .*?:.*?:.*, and then in each
result,
> find .*?:.*?: and return the string starting at the point just after
the
> length of the match.
> no exponential time there, and even that is probably more inefficient
than
> it has to be.

But that's not what you are telling it to do. You are telling it to
firstly find each position which starts a match with .* -- i.e. every
position -- and then look backwards to check that the previous text
matches .*?:.*?:

To grab the text after the 2nd colon (if indeed there are two or more),
it's much simpler to do this:

>>> import re
>>> q = re.compile(r'.*?:.*?:(.*)').search
>>> def grab(s):
...    m = q(s)
...    if m:
...       print m.group(1)
...    else:
...       print 'not found!'
...
>>> grab('')
not found!
>>> grab('::::')
::
>>> grab('a:b:yadda')
yadda
>>>>>> grab('a:b:c:d')
c:d
>>> grab('a:b:')

>>>




More information about the Python-list mailing list