LCS for word ? or array intersection ?

Thu Jan 23 13:37:48 EST 2003

hi,

i've got a question for the guys from here, i think this might be done
by something like decorate/undecorate and LCS (Longest Common
Sequence) but  perhaps you had a better idea than mine.

for example :
s1 = """this is a line containing an example"""
s2 = """this is another sentence containing 12 words and it is another
example"""

and need following output:
["""this is""", """containing""", """example"""]
["""a line""", """an example"""]
["""another sentence""", """12 words and it is another"""]

my first idea is to decorate each word with its length, then use a LCS
algorithm (like in difflib) and then only keep sequences which are
decorate by their length, then process by substring completion.

but i'm not sure that's the obvious and most efficient way.

any suggestions are greatly welcome !

i was wondering if it could'nt be done by using some kind of array
operator like intersection and then substraction ?

regards.