LCS for word ? or array intersection ?

Sun Jan 26 12:04:48 EST 2003

gregtehrig at yahoo.fr (Greg Tehrig) wrote in message 
> > i've got a question for the guys from here, i think this might be done
> > by something like decorate/undecorate and LCS (Longest Common
> > Sequence) but  perhaps you had a better idea than mine.
> > 
> > for example :
> > s1 = """this is a line containing an example"""
> > s2 = """this is another sentence containing 12 words and it is another example"""
> > 
> > and need following output:
> > ["""this is""", """containing""", """example"""]
> > ["""a line""", """an example"""]

Not sure whether this is exactly what you're looking for, but:

>>> import lcs
>>> s1 = """this is a line containing an example"""
>>> s2 = """this is another sentence containing 12 words and it is 
another example"""
>>> lcs.longestCommonSubsequence(s1.split(" "), s2.split(" "))
['this', 'is', 'containing', 'example']

This is with the lcs code I had in
http://www.ics.uci.edu/~eppstein/161/python/lcs.py

Undecorating to get the words of s1 not in the lcs (your second output) 
looks straightforward enough...

-- 
David Eppstein       UC Irvine Dept. of Information & Computer Science
eppstein at ics.uci.edu http://www.ics.uci.edu/~eppstein/