Match First Sequence in Regular Expression?
Tim Chase
python.list at tim.thechases.com
Thu Jan 26 14:23:47 EST 2006
>> "xyz123aaabbaaabab"
>>
>> where you have "aaab" in there twice.
>
> Good suggestion.
I assumed that this would be a valid case. If not, the
expression would need tweaking.
>> ^([^b]|((?<!a)b))*aaab+[ab]*$
>
> Looks good, although I've been unable to find a good
> explanation of the "negative lookbehind" construct "(?<". How
> does it work?
The beginning part of the expression
([^b]|((?<!a)b))*
breaks down as
[^b] anything that isn't a "b"
| or
(...) this other thing
where "this other thing" is
(?<!a)b a "b" as long as it isn't immediately
preceeded by an "a"
The "(?<!...)" construct means that the "..." portion can't come
before the following token in the regexp...in this case, before a
"b".
There's also a "negative lookahead" (rather than "lookbehind")
which prevents items from following. This should be usable in
this scenario as wall and works with the aforementioned tests, using
"^([^a]|(a(?!b)))*aaab+[ab]*$"
which would be "anything that's not an 'a'; or an 'a' as long as
it's not followed by a 'b'"
The gospel is at:
http://docs.python.org/lib/re-syntax.html
but is a bit terse. O'reily has a fairly good book on regexps if
you want to dig a bit deeper.
-tkc
More information about the Python-list
mailing list