[Tutor] longest common substring

Andreas Perstinger andreas.perstinger at gmx.net
Fri Nov 11 22:49:51 CET 2011


First, just a little rant :-)
It doesn't help to randomly change some lines or introduce some new 
concepts you don't understand yet and then hope to get the right result. 
Your chances are very small that this will be succesful.
You should try to understand some basic concepts first and build on them.
 From your postings the last weeks and especially from today I have the 
impression that you still don't understand how fundamental programming 
concepts work: for-loops, differences between data types (strings, 
lists, sets, ...)
Honestly, have you already read any programming tutorial? (You'll find a 
big list at http://wiki.python.org/moin/BeginnersGuide/NonProgrammers )? 
At the moment it looks like you are just copying some code snippets from 
different places and then you hopelessly try to modify them to suit your 
needs. IMHO the problems you want to solve are a little too big for you 
right now.

Nevertheless, here are some comments:

> Based on former advice, I made a correction/modification on the below code.
>
> 1] the set and subgroup does not work, here I wish to put all the
> subgroup in a big set, the set like

That's a good idea, but you don't use the set correctly.

 > subgroups=[]
 > subgroup=[]
 > def LongestCommonSubstring(S1, S2):

I think it's better to move "subgroups" and "subgroup" into the 
function. (I've noticed that in most of your scripts you are using a lot 
of global variables. IMHO that's not the best programming style. Do you 
know what "global/local variables", "namespace", "scope" mean?)

You are defining "subgroups" as an empty list, but later you want to use 
it as a set. Thus, you should define it as an empty set:

subgroups = set()

You are also defining "subgroup" as an empty list, but later you assign 
a slice of "S1" to it. Since "S1" is a string, the slice is also a 
string. Therefore:

subgroup = ""

 >      M = [[0]*(1+len(S2)) for i in xrange(1+len(S1))]

Peter told you already why "xrange" doesn't work in Python 3. But 
instead of using an alias like

xrange = range

IMHO it's better to change it in the code directly.

 >      longest, x_longest = 0, 0
 >      for x in xrange(1,1+len(S1)):
 >          for y in xrange(1,1+len(S2)):
 >              if S1[x-1] == S2[y-1]:
 >                  M[x][y] = M[x-1][y-1]+1
 >                  if M[x][y]>  longest:
 >                      longest = M[x][y]
 >                      x_longest = x
 >                  if longest>= 3:
 >                      subgroup=S1[x_longest-longest:x_longest]
 >                      subgroups=set([subgroup])

Here you overwrite in the first iteration your original empty list 
"subgroups" with the set of the list which contains the string 
"subgroup" as its only element. Do you really understand this line?
And in all the following iterations you are overwriting this one-element 
set with another one-element set (the next "subgroup").
If you want to add an element to an existing set instead of replacing 
it, you have to use the "add()"-method for adding an element to a set:

subgroups.add(subgroup)

This will add the string "subgroup" as a new element to the set "subgroups".

 >                      print(subgroups)
 >              else:
 >                      M[x][y] = 0
 >
 >      return S1[x_longest-longest:x_longest]

Here you probably want to return the set "subgroups":

return subgroups


> 2] I still have trouble in reading files, mainly about not read "" etc.

The problem is that in your data files there is just this big one-line 
string. AFAIK you have produced these data files yourself, haven't you? 
In that case it would be better to change the way how you save the data 
(be it a well-formatted string or a list or something else) instead of 
trying to fix it here (in this script).

Bye, Andreas


More information about the Tutor mailing list