how to get all repeated group with regular expression

MRAB google at mrabarnett.plus.com
Fri Nov 21 12:00:56 EST 2008


Steve Holden wrote:
> Please keep this on the list.
> 
> scsoce wrote:
>> Steve Holden wrote:
>>> scsoce wrote:
>>>  
>>>> say, when I try to search and match every char  from variable length
>>>> string, such as string '123456',  i tried re.findall( r'(\d)*, '12346' )
>>>>     
>>> I think you will find you missed a quote out there. Always better to
>>> copy and paste ...
>>>
>>>  
>>>> , but only get '6' and Python doc indeed say: "If a group is contained
>>>> in a part of the pattern that matched multiple times, the last match is
>>>> returned."
>>>>     
>>> So use
>>>
>>>     r'(\d*)'
>>>
>>> instead and then the group includes all the digits you match.
>>>
>>>  
>>>> cause the regx engine cannot remember all the past history then ?  is it
>>>> nature to all regx engine or only to Python ?
>>>>     
>>> Different regex engines have different capabilities, so I can't speak to
>>> them all. If you wanted *all* the matches of *all* groups, how would you
>>> have them returned? As a list? That would make the case where there was
>>> only one match  much tricker to handle. And what would you do with
>>>
>>>   r'((\w)*\d)*)'
>>>
>>> Also, what about named groups? I can see enough potential implementation
>>> issues that I can perfectly understand why Python works the way it does,
>>> so I'd be interested to know why it doesn't makes sense to you, and what
>>> you would prefer it to do.
>>>
>>> regards
>>>  Steve
>>>   
>> maybe my expression was not clear. I  want to capture every matched part
>> in a repeated pattern, not only the last,  say, for string '123456',  I
>> want to back reference any one char, not only the '6'. and i know the
>> example is very simple, so we can got the whole string using regx and
>> get every char using other python statements, but if the pattern in
>> group is complex?
>> and I test in VIM, it can do the 'back reference':
>> ==you text in vim:
>> 123456
>> == pattern:
>> :%s/\(\d\)*/$2
>> text will turn to be:
>> 2
>>
> 'Fraid the Python re implementers just decided not to do it that way.
> 
Nor Perl.

Probably what you want is re.findall(r"(\d)", "123456"), which returns a 
list of what it captured.




More information about the Python-list mailing list