re findall mod for issue of side effects
Andrew Henshaw
andrew_dot_henshaw_at_earthling_dot_net
Sun Jan 14 23:17:02 EST 2001
I've changed a line, added a line, and added a 'grammaticGrouping' parameter
to the definition of RegexObj.findall (the parameter gets modified in the
user api, also). This appears to change the behavior of findall to be
consistent with what I was desiring; that is, a way to specify grouping in a
regex pattern, without returning tuples in the findall result.
An example:
>>> s='..abcabcxyz..'
# Try a simple pattern
>>> r=re.compile('abcxyz')
>>> r.findall(s)
['abcxyz']
# Now add some grouping to the pattern
>>> r=re.compile('(abc)*(xyz)*')
>>> r.findall(s)
[('', ''), ('', ''), ('abc', 'xyz'), ('', ''), ('', ''), ('', '')]
# Wow, that changed the return value dramatically
# Set the grammaticGrouping flag to 1, using the patched findall code
>>> r.findall(s,1)
['abcabcxyz']
# That's consistent with the result type from the first pattern.
Does anybody else see this to be as useful as I do?
AH
####### patched code follows ##########
def findall(self, source, grammaticGrouping=0): #new
grammaticGrouping parameter
"""Return a list of all non-overlapping matches in the string.
If one or more groups are present in the pattern and
grammaticGrouping is false, return a
list of groups; this will be a list of tuples if the pattern
has more than one group.
Empty matches are included in the result.
"""
pos = 0
end = len(source)
results = []
match = self.code.match
append = results.append
while pos <= end:
regs = match(source, pos, end, 0)
if not regs:
break
i, j = regs[0]
rest = regs[1:]
if not rest or grammaticGrouping: #new: changed from 'if not
rest:'
gr = source[i:j]
elif len(rest) == 1:
a, b = rest[0]
gr = source[a:b]
else:
gr = []
for (a, b) in rest:
gr.append(source[a:b])
gr = tuple(gr)
#was: append(gr)
if gr or not grammaticGrouping: #new: added this line
append(gr) #new: indented
pos = max(j, pos+1)
return results
More information about the Python-list
mailing list