re.match -- not greedy?
Noah Rawlins
noah.rawlins at comcast.net
Sun Nov 19 19:59:45 EST 2006
Carl Banks wrote:
> EXI-Andrews, Jack wrote:
>> the '*' and '+' don't seem to be greedy.. they will consume less in
>> order to match a string:
>>
>>>>> import re;re.match('(a+)(ab)','aaab').groups()
>> ('aa', 'ab')
>>
>> this is the sort of behaviour i'd expect from
>> '(a+?)(ab)'
>>
>> a+ should greedily consume a's at the expense of the string not matching
>
> It's called backtracking. If a match fails, the regexp engine
> recursively unwinds some of the greedy matching and tries again.
>
>> in real life, i am trying to match #defines:
>>
>>>>> re.match( '#define ([A-Za-z0-9_]+)([^(])', '#define
>> abc(d)').groups()
>> ('ab', 'c')
>>
>> i want this example to fail because the first character after a string
>> of letters is a '('
>> i want to match only #defines without parameters.
>
> Probably the easiest thing to do is to attempt to match zero or one
> opening parentheses, and bail out if it did end up matching the
> parenthesis.
>
> m = re.match(r"#define ([A-Za-z0-9_]+)(\(?)","#define abc(d)")
> if m and m.groups(2) != "(":
> ...
>
> (BTW, it's good practice to use a raw strings for the regular
> expressions as I did.)
>
Another way that seems clearer to me is to use negative lookahead
assertions.
>>>defPatt = re.compile(r'#define\s+(\w+)\b(?!\()')
>>> defPatt.match("#define abc(x)")
>>> defPatt.match("#define foo_BarBaz")
<_sre.SRE_Match object at 0xb7dd5820>
>>> defPatt.match("#define foo_BarBaz").groups()
('foo_BarBaz',)
In general '\w' is the same as [A-Za-z0-9_]
There are other considerations too... I don't know if
#define abc (x)
But the main thing here is the use of '\b' to require there to be a word
boundary at the end your match and a (?!\() to require that the match is
not followed by a '('
see http://docs.python.org/lib/re-syntax.html
noah
>> so what's the definition of greedy?
>
> It means match as much you can *while getting a match*.
>
> Just remember regexp parsing is not a one way street: if it hits a dead
> end, it'll go back and try another path. In fact, greediness usually
> doesn't affect *whether* a regexp matches; it only affects the
> groupings. I'm suddenly curious if there are any cases at all where
> greediness changes whether it finds a match.
>
>> (pls copy me on responses)
>
> Ah, too bad you won't see it then....
>
>
> Carl Banks
>
More information about the Python-list
mailing list