regular expressions: grabbing variables from multiple matches

Alex Martelli aleaxit at yahoo.com
Thu Jan 4 04:57:24 EST 2001


"D-Man" <dsh8290 at rit.edu> wrote in message
news:mailman.978580504.31501.python-list at python.org...
>
> At first I was confused by what you meant with findall returning a
> list of matches, not match objects.
>
> Here's my interpreter session (excuse the poor variable naming, it's
> just for testing):
>
> >>> import re
> >>> r = re.compile( "asdf" )
> >>> m = r.match( "asdf" )
> >>> m
> <SRE_Match object at 0x8130ca8>
> >>> ml = r.findall( "asdfasdf" )
> >>> ml
> ['asdf', 'asdf']
> >>> for str in ml :
> ...     m = r.match( str )
> ...     print m
> ...
> <SRE_Match object at 0x8111108>
> <SRE_Match object at 0x812ee28>
> >>>
>
> Since findall gives you back the strings that match the regex, you can
> go through them and call match() to get a match object.  This should
> be faster than matching arbitrary text since you know (in the loop)
> all the strings will match.
>
> It will probably hurt performance though, but at least it will be
> functional.

Actually, I suspect this is not what the original poster needed.

A little modification to the final loop may show it more clearly:

>>> for str in ml:
...     m = r.match(str)
...     print m.start(),m
...
0 <SRE_Match object at 00812090>
0 <SRE_Match object at 008111D0>
>>>

See...?  What we're getting is just a description of how the re
matches (starting at 0) each substring -- we've LOST the info of
how the matches were located in the original string (start and
end information).

The "searchall" function one might well desire (to give all the
match-objects for all non-overlapping matches of one re in a
string) needs a different approach, something like (in your
interactive-session terms):

>>> matches=[]
>>> pos=0
>>> while 1:
...   mo = r.search('asdfasdf',pos)
...   if mo is None: break
...   matches.append(mo)
...   pos = mo.end()
...
>>> for match in matches:
...   print match.start(), match.end()
...
0 4
4 8
>>>

i.e., as a function, it could be:

def searchall(re, string):
    matches = []
    pos = 0
    while 1:
        mo = re.search(string,pos)
        if mo is None: return matches
        matches.append(mo)
        pos = mo.end()

Nothing too terrible, sure, but it might be nice
to have this as a part of the module (and as a
method of re objects), as it seems to me that its
frequence of occurrence is no lesser than that of
findall's (where only the matching strings are
needed, not the match-positions).


Alex






More information about the Python-list mailing list