Match First Sequence in Regular Expression?

Tim Chase python.list at tim.thechases.com
Thu Jan 26 14:23:47 EST 2006


>> "xyz123aaabbaaabab"
>> 
>> where you have "aaab" in there twice.
> 
> Good suggestion.

I assumed that this would be a valid case.  If not, the
expression would need tweaking.

>> ^([^b]|((?<!a)b))*aaab+[ab]*$
> 
> Looks good, although I've been unable to find a good
> explanation of the "negative lookbehind" construct "(?<".  How
> does it work?

The beginning part of the expression

	([^b]|((?<!a)b))*

breaks down as

	[^b]        anything that isn't a "b"
	|           or
	(...)       this other thing

where "this other thing" is

	(?<!a)b     a "b" as long as it isn't immediately
	            preceeded by an "a"

The "(?<!...)" construct means that the "..." portion can't come 
before the following token in the regexp...in this case, before a 
"b".

There's also a "negative lookahead" (rather than "lookbehind") 
which prevents items from following.  This should be usable in 
this scenario as wall and works with the aforementioned tests, using

	"^([^a]|(a(?!b)))*aaab+[ab]*$"

which would be "anything that's not an 'a'; or an 'a' as long as 
it's not followed by a 'b'"

The gospel is at:
http://docs.python.org/lib/re-syntax.html

but is a bit terse.  O'reily has a fairly good book on regexps if 
you want to dig a bit deeper.

-tkc






More information about the Python-list mailing list