best way to align words?

Noah Rawlins noah.rawlins at comcast.net
Thu Nov 30 21:25:57 EST 2006


Robert R. wrote:
> Hello,
> 
> i would like to write a piece of code to help me to align some sequence
> of words and suggest me the ordered common subwords of them
> 
> s0 = "this is an example of a thing i would like to have".split()
> s1 = "another example of something else i would like to have".split()
> s2 = 'and this is another " example " but of something ; now i would
> still like to have'.split()
> ...
> alist = (s0, s1, s2)
> 
> result should be : ('example', 'of', 'i', 'would', 'like', 'to', 'have'
> 
> but i do not know how should i start, may be have you a helpful
> suggestion?
> a trouble i have if when having many different strings my results tend
> to be nothing while i still would like to have one of the, or maybe,
> all the best matches.
> 
> best.
> 

Your requirements are a little vague... how are these three strings handled?

s1 = "hello there dudes"
s2 = "dudes hello there"
s3 = "there dudes hello"

they all share the 3 words, but what order do you want them back?

here is a simplistic approach using sets that results in a list of words 
that are in all strings ordered arbitrarily by their order in the first 
string ( it also doesn't worry about matches (or lack of) due to 
punctuation and case and crap like that)

 >>> strList = []
 >>> strList.append('this is an example of a thing i would like to have')
 >>> strList.append('another example of something else i would like to 
have')
 >>> strList.append('and this is another " example " but of something ; 
now i would still like to have')
 >>> [word for word in strList[0].split() if word in reduce(lambda x, y: 
x.intersection(y), [set(str.split()) for str in strList])]
['example', 'of', 'i', 'would', 'like', 'to', 'have']

but you still have issues with mutiple matches and how they are handled 
etc...

noah



More information about the Python-list mailing list