re findall mod for issue of side effects

Andrew Henshaw andrew_dot_henshaw_at_earthling_dot_net
Sun Jan 14 23:17:02 EST 2001


I've changed a line, added a line, and added a 'grammaticGrouping' parameter
to the definition of RegexObj.findall (the parameter gets modified in the
user api, also).  This appears to change the behavior of findall to be
consistent with what I was desiring; that is, a way to specify grouping in a
regex pattern, without returning tuples in the findall result.

An example:

>>> s='..abcabcxyz..'

# Try a simple pattern
>>> r=re.compile('abcxyz')
>>> r.findall(s)
['abcxyz']

# Now add some grouping to the pattern
>>> r=re.compile('(abc)*(xyz)*')
>>> r.findall(s)
[('', ''), ('', ''), ('abc', 'xyz'), ('', ''), ('', ''), ('', '')]
# Wow, that changed the return value dramatically

# Set the grammaticGrouping flag to 1, using the patched findall code
>>> r.findall(s,1)
['abcabcxyz']
# That's consistent with the result type from the first pattern.


Does anybody else see this to be as useful as I do?

AH


####### patched code follows  ##########
    def findall(self, source, grammaticGrouping=0):    #new
grammaticGrouping parameter
        """Return a list of all non-overlapping matches in the string.

        If one or more groups are present in the pattern and
        grammaticGrouping is false, return a
        list of groups; this will be a list of tuples if the pattern
        has more than one group.

        Empty matches are included in the result.

        """
        pos = 0
        end = len(source)
        results = []
        match = self.code.match
        append = results.append
        while pos <= end:
            regs = match(source, pos, end, 0)
            if not regs:
                break
            i, j = regs[0]
            rest = regs[1:]
            if not rest or grammaticGrouping:   #new: changed from 'if not
rest:'
                gr = source[i:j]
            elif len(rest) == 1:
                a, b = rest[0]
                gr = source[a:b]
            else:
                gr = []
                for (a, b) in rest:
                    gr.append(source[a:b])
                gr = tuple(gr)
            #was: append(gr)
            if gr or not grammaticGrouping:     #new: added this line
                append(gr)                      #new: indented
            pos = max(j, pos+1)
        return results






More information about the Python-list mailing list