regex problem
John Machin
sjmachin at lexicon.net
Tue Jul 26 08:43:34 EDT 2005
Odd-R. wrote:
> Input is a string of four digit sequences, possibly
> separated by a -, for instance like this
>
> "1234,2222-8888,4567,"
>
> My regular expression is like this:
>
> rx1=re.compile(r"""\A(\b\d\d\d\d,|\b\d\d\d\d-\d\d\d\d,)*\Z""")
>
> When running rx1.findall("1234,2222-8888,4567,")
>
> I only get the last match as the result. Isn't
> findall suppose to return all the matches?
For a start, an expression that starts with \A and ends with \Z will
match the whole string (or not match at all). You have only one match.
Secondly, as you have a group in your expression, findall returns what
the group matches. Your expression matches zero or more of what your
group matches, provided there is nothing else at the start/end of the
string. The "zero or more" makes the re engine waltz about a bit; when
the music stopped, the group was matching "4567,".
Thirdly, findall should be thought of as merely a wrapper around a loop
using the search method -- it finds all non-overlapping matches of a
pattern. So the clue to get from this is that you need a really simple
pattern, like the following. You *don't* have to write an expression
that does the looping.
So here's the mean lean no-flab version -- you don't even need the
parentheses (sorry, Thomas).
>>> rx1=re.compile(r"""\b\d\d\d\d,|\b\d\d\d\d-\d\d\d\d,""")
>>> rx1.findall("1234,2222-8888,4567,")
['1234,', '2222-8888,', '4567,']
HTH,
John
More information about the Python-list
mailing list