find all index positions

John Machin sjmachin at lexicon.net
Fri May 12 17:59:15 EDT 2006


On 13/05/2006 1:45 AM, vbgunz wrote:
> Hello John,
> 
> Thank you very much for your pointers! I decided to redo it and try to
> implement your suggestion. I think I did a fair job and because of your
> suggestion have a better iterator. Thank you!
> 
> def indexer(string, substring, overlap=1):
>     '''indexer(string, substring, [overlap=1]) -> int
> 
>     indexer takes a string and searches it to return all substring
>     indexes. by default indexer is set to overlap all occurrences.
>     to get the index to whole words only, set the overlap argument
>     to the length of the substring.

(1) Computing the length should be done inside the function, if 
necessary, which (2) avoids the possibility of passing in the wrong 
length. (3) "whole words only" does *NOT* mean the same as "substrings 
don't overlap".

> The only pitfall to indexer is
>     it will return the substring whether it stansalone or not.
> 
>     >>> list(indexer('ababababa', 'aba'))
>     [0, 2, 4, 6]
> 
>     >>> list(indexer('ababababa', 'aba', len('aba')))
>     [0, 4]
> 
>     >>> list(indexer('ababababa', 'xxx'))
>     []
> 
>     >>> list(indexer('show chow', 'how'))
>     [1, 6]
>     '''
> 
>     index = string.find(substring)
>     if index != -1:
>         yield index
> 
>     while index != -1:
>         index = string.find(substring, index + overlap)
>         if index == -1: continue
>         yield index

Quite apart from the fact that you are now using both 'string' *AND* 
'index' outside their usual meaning, this is hard to follow. (1) You 
*CAN* avoid doing the 'find' twice without losing readibility and 
elegance. (2) continue?? Somebody hits you if you use the 'return' 
statement or the 'break' statement?

Sigh. I'll try once more. Here is the function I wrote, with the minimal 
changes required to make it an iterator, plus changing from 0/1 to 
False/True:

def findallstr(text, target, overlapping=False):
      startpos = 0
      if overlapping:
          jump = 1
      else:
          jump = max(1, len(target))
      while True:
          newpos = text.find(target, startpos)
          if newpos == -1:
              return
          yield newpos
          startpos = newpos + jump

> 
> if __name__ == '__main__':
>     print list(indexer('ababababa', 'aba'))  # -> [0, 2, 4, 6]
>     print list(indexer('ababababa', 'aba', len('aba')))  # -> [0, 4]
>     print list(indexer('ababababa', 'xxx'))  # -> []
>     print list(indexer('show chow', 'how'))  # -> [1, 6]
> 

Get yourself a self-checking testing mechanism, and a more rigorous set 
of tests. Ultimately you will want to look at unittest or pytest, but 
for a small library of functions, you can whip up your own very quickly. 
Here is what I whipped up yesterday:

def indexer2(string, target):
     res = []
     if string.count(target) >= 1:
         res.append(string.find(target))
         if string.count(target) >= 2:
             for item in xrange(string.count(target) - 1):
                 res.append(string.find(target, res[-1] + 1))
     return res # dedent fixed

if __name__ == '__main__':
     tests = [
         ('a long long day is long', 'long',  [2, 7, 19], [2, 7, 19]),
         ('a long long day is long', 'day',   [12], [12]),
         ('a long long day is long', 'short', [], []),
         ('abababababababa', 'aba', [0, 4, 8, 12], [0, 2, 4, 6, 8, 10, 12]),
         ('qwerty', '', range(7), range(7)),
         ('', 'qwerty', [], []),
         ]
     for test in tests:
         text, target = test[:2]
         results = test[2:]
         for olap in range(2):
             result = findallstr(text, target, olap)
             print (
                 'FAS', text, target, olap,
                 result, results[olap], result == results[olap],
                 )
     for test in tests:
         text, target = test[:2]
         results = test[2:]
         result = indexer2(text, target)
         print (
             'INDXR2', text, target,
             result,  result == results[0], result == results[1],
             )

Make sure your keyboard interrupt is not disabled before you run the 
2nd-last test :-)

HTH,
John



More information about the Python-list mailing list