Match First Sequence in Regular Expression?

Tim Chase python.list at tim.thechases.com
Thu Jan 26 09:34:04 EST 2006


 > Say I have some string that begins with an arbitrary
 > sequence of characters and then alternates repeating the
 > letters 'a' and 'b' any number of times, e.g.
 > "xyz123aaabbaabbbbababbbbaaabb"
 >
 > I'm looking for a regular expression that matches the
 > first, and only the first, sequence of the letter 'a', and
 > only if the length of the sequence is exactly 3.
 >
 > Does such a regular expression exist?  If so, any ideas as
 > to what it could be?
 >

I'm not quite sure what your intent here is, as the
resulting find would obviously be "aaa", of length 3.

If you mean that you want to test against a number of
things, and only find items where "aaa" is the first "a" on
the line, you might try something like

	import re
	listOfStringsToTest = [
		'helloworld',
		'xyz123aaabbaabababbab',
		'cantalopeaaabababa',
		'baabbbaaabbbbb',
		'xyzaa123aaabbabbabababaa']
	r = re.compile("[^a]*(a{3})b+(a+b+)*")
	matches = [s for s in listOfStringsToTest if r.match(s)]
	print repr(matches)

If you just want the *first* triad of "aaa", you can change
the regexp to

	r = re.compile(".*?(a{3})b+(a+b+)*")

With a little more detail as to the gist of the problem,
perhaps a better solution can be found.  In particular, are
there items in the listOfStringsToTest that should be found
but aren't with either of the regexps?

-tkc










More information about the Python-list mailing list