finding sublist

Johan Lindberg johan at pulp.se
Tue Aug 2 13:33:05 EDT 2005


Hello.

> my target is implement a function controlla(string - a binary string-)
> that check if there is in a string two equal not overlapping
> subsequences at least of length limitem:
>
> my code:
> [snip]
>

I may have misunderstood it, but does your function work the way you
want it to?

>>>controlla("testeststest")
no

I can't get it to print anything other than "no". But then again, I'm
reading and posting via Google and I guess all those break statements
shouldn't be on the same indent-level.

> the question is: Is there a faster way doing that? I don't know,
> changing string into list or array, using map, or module, using
> different loop, regular expression,funcional programming , list
> comprehensions , sets, different looping techniques, i dont
> know.......(!)

Since you're using nested for loops when searching the string you
should make sure not to perform more iterations than neccessary. The
function below returns a list of all, non-overlapping, substrings in
text where len(substring)>= minLength. The outer loop is limited to
about half of the length of the text which is where most of the speed
comes from but I'm sure it can be tweaked for more.

def foo(text, minLength):
  result= []
  for length in range(minLength, len(text)/ 2+ 1):
    for start in range(len(text)):
      end= start+ length
      if end< len(text):
        part= text[start:end]
        if text.find(part, end)!= -1:
          if part not in result:
            result.append(part)

  return result

>>>foo("testeststest", 4)
['test', 'stes', 'stest']

>>>foo("testeststest", 3)
['tes', 'est', 'ste', 'test', 'stes', 'stest']

HTH
Johan Lindberg
johan at pulp.se




More information about the Python-list mailing list