a RegEx puzzle

Michael Spencer mahs at telcopartners.com
Fri Mar 11 16:16:08 EST 2005


Charles Hartman wrote:
> I'm still shaky on some of sre's syntax. Here's the task: I've got 
> strings (never longer than about a dozen characters) that are guaranteed 
> to be made only of characters 'x' and '/'. In each string I want to find 
> the longest continuous stretch of pairs whose first character is 'x' and 
> the second is either mark. So if marks = '/xx/xxx///', the "winning" 
> stretch begins at position 2 and is 6 characters long ('x/xxx/'), which 
> requires finding a second match that overlaps the first match (which is 
> just 'xx' in position 1). (When there are multiple "winning" stretches, 
> I might want to adjudicate among them, but that's a separate problem.) I 
> hope this is clear enough.
> 
Seconding the suggestion that this needn't use a regex:

  >>> def getlongest(source):
  ...     length = len(source)-1
  ...     longest_match = longest_match_pos = 0
  ...     i = 0
  ...     while longest_match < length:
  ...         j = 0
  ...         while j <= length:
  ...             if source[i + j] != "x":
  ...                 break
  ...             j = j + 2
  ...
  ...         if j > longest_match:
  ...             longest_match = j
  ...             longest_match_pos = i
  ...
  ...
  ...         i = i+1
  ...         length = length - 1
  ...
  ...     return longest_match_pos, longest_match
  ...
  >>> tests = ['/xx/xxx///', '///x/x/xx/xx/x/x/x/', 'x/x/xx///xx/x/x/']
  >>> map(getlongest, tests)
  [(2, 6), (11, 8), (0, 6)]
  >>>
This is an iterative solution with short-circuiting i.e., it stops looking once 
  there are too fewer characters remaining than the longest match found so far.

It also avoids any function calls in the main loop, which helps its speed: 
100,000 tests on 20-character strings take < 2 secs on my (1.5GHz) machine

Michael






More information about the Python-list mailing list