Match First Sequence in Regular Expression?
Tim Chase
python.list at tim.thechases.com
Thu Jan 26 09:34:04 EST 2006
> Say I have some string that begins with an arbitrary
> sequence of characters and then alternates repeating the
> letters 'a' and 'b' any number of times, e.g.
> "xyz123aaabbaabbbbababbbbaaabb"
>
> I'm looking for a regular expression that matches the
> first, and only the first, sequence of the letter 'a', and
> only if the length of the sequence is exactly 3.
>
> Does such a regular expression exist? If so, any ideas as
> to what it could be?
>
I'm not quite sure what your intent here is, as the
resulting find would obviously be "aaa", of length 3.
If you mean that you want to test against a number of
things, and only find items where "aaa" is the first "a" on
the line, you might try something like
import re
listOfStringsToTest = [
'helloworld',
'xyz123aaabbaabababbab',
'cantalopeaaabababa',
'baabbbaaabbbbb',
'xyzaa123aaabbabbabababaa']
r = re.compile("[^a]*(a{3})b+(a+b+)*")
matches = [s for s in listOfStringsToTest if r.match(s)]
print repr(matches)
If you just want the *first* triad of "aaa", you can change
the regexp to
r = re.compile(".*?(a{3})b+(a+b+)*")
With a little more detail as to the gist of the problem,
perhaps a better solution can be found. In particular, are
there items in the listOfStringsToTest that should be found
but aren't with either of the regexps?
-tkc
More information about the Python-list
mailing list