regex question

proctor 12cc104 at gmail.com
Fri Apr 27 11:17:51 EDT 2007


On Apr 27, 8:37 am, Duncan Booth <duncan.bo... at invalid.invalid> wrote:
> proctor <12cc... at gmail.com> wrote:
> > so my question remains, why doesn't the star quantifier seem to grab
> > all the data.  isn't findall() intended to return all matches?  i
> > would expect either 'abc' or 'a', 'b', 'c' or at least just
> > 'a' (because that would be the first match).  why does it give only
> > one letter, and at that, the /last/ letter in the sequence??
>
> findall returns the matched groups. You get one group for each
> parenthesised sub-expression, and (the important bit) if a single
> parenthesised expression matches more than once the group only contains
> the last string which matched it.
>
> Putting a star after a subexpression means that subexpression can match
> zero or more times, but each time it only matches a single character
> which is why your findall only returned the last character it matched.
>
> You need to move the * inside the parentheses used to define the group,
> then the group will match only once but will include everything that it
> matched.
>
> Consider:
>
> >>> re.findall('(.)', 'abc')
> ['a', 'b', 'c']
> >>> re.findall('(.)*', 'abc')
> ['c', '']
> >>> re.findall('(.*)', 'abc')
>
> ['abc', '']
>
> The first pattern finds a single character which findall manages to
> match 3 times.
>
> The second pattern finds a group with a single character zero or more
> times in the pattern, so the first time it matches each of a,b,c in turn
> and returns the c, and then next time around we get an empty string when
> group matched zero times.
>
> In the third pattern we are looking for a group with any number of
> characters in it. First time we get all of the string, then we get
> another empty match.

thank you this is interesting.  in the second example, where does the
'nothingness' match, at the end?  why does the regex 'run again' when
it has already matched everything?  and if it reports an empty match
along with a non-empty match, why only the two?

sincerely,
proctor




More information about the Python-list mailing list