Regular Expression - Matching Multiples of 3 Characters exactly.
Jeff
jeffober at gmail.com
Mon Apr 28 09:22:32 EDT 2008
Regular expressions for that sort of thing can get *really* big. The
most efficient way would be to programmatically compose the regular
expression to be as exact as possible.
import re
def permutation(lst):
""""
From http://labix.org/snippets/permutations/. Computes permutations
of a
list iteratively.
"""
queue = [-1]
lenlst = len(lst)
while queue:
i = queue[-1]+1
if i == lenlst:
queue.pop()
elif i not in queue:
queue[-1] = i
if len(queue) == lenlst:
yield [lst[j] for j in queue]
queue.append(-1)
else:
queue[-1] = i
def segment_re(a, b):
"""
Creates grouped regular expression pattern to match text between all
possibilies of three-letter sets a and b.
"""
def pattern(n):
return "(%s)" % '|'.join( [''.join(grp) for grp in permutation(n)] )
return re.compile( r'%s(\w+?)%s' % (pattern(a), pattern(b)) )
print segment_re(["a", "b", "c"], ["d", "e", "f"])
You could extend segment_re to accept an integer to limit the (\w+?)
to a definite quantifier. This will grow the compiled expression in
memory but make matching faster (such as \w{3,n} to match from 3 to n
characters).
See http://artfulcode.net/articles/optimizing-regular-expressions/ for
specifics on optimizing regexes.
More information about the Python-list
mailing list