regex question

Duncan Booth duncan.booth at invalid.invalid
Fri Apr 27 11:36:20 EDT 2007


proctor <12cc104 at gmail.com> wrote:

>> >>> re.findall('(.)*', 'abc')
>> ['c', '']

> thank you this is interesting.  in the second example, where does the
> 'nothingness' match, at the end?  why does the regex 'run again' when
> it has already matched everything?  and if it reports an empty match
> along with a non-empty match, why only the two?
> 

There are 4 possible starting points for a regular expression to match in a 
three character string. The regular expression would match at any starting 
point so in theory you could find 4 possible matches in the string. In this 
case they would be 'abc', 'bc', 'c', ''.

However findall won't get any overlapping matches, so there are only two 
possible matches and it returns both of them: 'abc' and '' (or rather it 
returns the matching group within the match so you only see the 'c' 
although it matched 'abc'.

If you use a regex which doesn't match an empty string (e.g. '/x(.*?)x/' 
then you won't get the empty match.



More information about the Python-list mailing list