regex problem

John Machin sjmachin at lexicon.net
Tue Jul 26 08:43:34 EDT 2005


Odd-R. wrote:
> Input is a string of four digit sequences, possibly
> separated by a -, for instance like this
> 
> "1234,2222-8888,4567,"
> 
> My regular expression is like this:
> 
> rx1=re.compile(r"""\A(\b\d\d\d\d,|\b\d\d\d\d-\d\d\d\d,)*\Z""")
> 
> When running rx1.findall("1234,2222-8888,4567,")
> 
> I only get the last match as the result. Isn't
> findall suppose to return all the matches?

For a start, an expression that starts with \A and ends with \Z will 
match the whole string (or not match at all). You have only one match.

Secondly, as you have a group in your expression, findall returns what 
the group matches. Your expression matches zero or more of what your 
group matches, provided there is nothing else at the start/end of the 
string. The "zero or more" makes the re engine waltz about a bit; when 
the music stopped, the group was matching "4567,".

Thirdly, findall should be thought of as merely a wrapper around a loop 
using the search method -- it finds all non-overlapping matches of a 
pattern. So the clue to get from this is that you need a really simple 
pattern, like the following. You *don't* have to write an expression 
that does the looping.

So here's the mean lean no-flab version -- you don't even need the 
parentheses (sorry, Thomas).

 >>> rx1=re.compile(r"""\b\d\d\d\d,|\b\d\d\d\d-\d\d\d\d,""")
 >>> rx1.findall("1234,2222-8888,4567,")
['1234,', '2222-8888,', '4567,']

HTH,
John



More information about the Python-list mailing list