regex question

Duncan Booth duncan.booth at invalid.invalid
Fri Apr 27 10:37:53 EDT 2007


proctor <12cc104 at gmail.com> wrote:

> so my question remains, why doesn't the star quantifier seem to grab
> all the data.  isn't findall() intended to return all matches?  i
> would expect either 'abc' or 'a', 'b', 'c' or at least just
> 'a' (because that would be the first match).  why does it give only
> one letter, and at that, the /last/ letter in the sequence??
> 
findall returns the matched groups. You get one group for each 
parenthesised sub-expression, and (the important bit) if a single 
parenthesised expression matches more than once the group only contains 
the last string which matched it.

Putting a star after a subexpression means that subexpression can match 
zero or more times, but each time it only matches a single character 
which is why your findall only returned the last character it matched.

You need to move the * inside the parentheses used to define the group, 
then the group will match only once but will include everything that it 
matched.

Consider:

>>> re.findall('(.)', 'abc')
['a', 'b', 'c']
>>> re.findall('(.)*', 'abc')
['c', '']
>>> re.findall('(.*)', 'abc')
['abc', '']

The first pattern finds a single character which findall manages to 
match 3 times.

The second pattern finds a group with a single character zero or more 
times in the pattern, so the first time it matches each of a,b,c in turn 
and returns the c, and then next time around we get an empty string when 
group matched zero times.

In the third pattern we are looking for a group with any number of 
characters in it. First time we get all of the string, then we get 
another empty match.



More information about the Python-list mailing list