a RegEx puzzle

Kent Johnson kent37 at tds.net
Fri Mar 11 13:06:46 EST 2005


Charles Hartman wrote:
> I'm still shaky on some of sre's syntax. Here's the task: I've got 
> strings (never longer than about a dozen characters) that are guaranteed 
> to be made only of characters 'x' and '/'. In each string I want to find 
> the longest continuous stretch of pairs whose first character is 'x' and 
> the second is either mark. So if marks = '/xx/xxx///', the "winning" 
> stretch begins at position 2 and is 6 characters long ('x/xxx/'), which 
> requires finding a second match that overlaps the first match (which is 
> just 'xx' in position 1). (When there are multiple "winning" stretches, 
> I might want to adjudicate among them, but that's a separate problem.) I 
> hope this is clear enough.
> 
> Here's the best I've come up with so far:
> 
> pat = sre.compile('(x[x/])+')
> (longest, startlongest) = max([(fnd.end()-fnd.start(), fnd.start()) for 
> i in range(len(marks))
>             for fnd in pat.finditer(marks,i)])

It's pretty simple to put re.search() into a loop where subsequent searches start from the character 
after where the previous one matched. Here is a solution that uses a general-purpose longest match 
function:

import re

# RE solution
def longestMatch(rx, s):
     ''' Find the longest match for rx in s.
         Returns (start, length) for the match or (None, None) if no match found.
     '''

     start = length = current = 0

     while True:
         m = rx.search(s, current)
         if not m:
             break

         mStart, mEnd = m.span()
         current = mStart + 1

         if (mEnd - mStart) > length:
             start = mStart
             length = mEnd - mStart

     if length:
         return start, length

     return None, None


pairsRe = re.compile(r'(x[x/])+')

for s in [ '/xx/xxx///', '//////xx//' ]:
     print s, longestMatch(pairsRe, s)



More information about the Python-list mailing list