best way to align words?

bearophileHUGS at lycos.com bearophileHUGS at lycos.com
Thu Nov 30 18:49:14 EST 2006


Robert R.:
> i would like to write a piece of code to help me to align some sequence
> of words and suggest me the ordered common subwords of them [...]
> a trouble i have if when having many different strings my results tend
> to be nothing while i still would like to have one of the, or maybe,
> all the best matches.

This is my first solution try, surely there are faster, shorter, better
solutions...


from collections import defaultdict
from itertools import chain
from graph import Graph
# http://sourceforge.net/projects/pynetwork/

def commonOrdered(*strings):
    lists = [[w for w in string.lower().split() if w.isalpha()] for
string in strings]

    freqs = defaultdict(int)
    for w in chain(*lists):
        freqs[w] += 1

    g = Graph()
    for words in lists:
        g.addPath(words)

    len_strings = len(strings)
    return [w for w in g.toposort() if freqs[w]==len_strings]


s0 = "this is an example of a thing i would like to have"
s1 = "another example of something else i would like to have"
s2 = 'and this is another " example " but of something ; now i would
still like to have'

print commonOrdered(s0, s1, s2)

It creates a graph with the paths of words, then sorts the graph
topologically, then takes only the words of the sorting that are
present in all the original strings.
With a bit of work the code can be used if it contains words like
"example" instead of " example ".
An xtoposort method too can be added to the Graph class...

Bye,
bearophile




More information about the Python-list mailing list