best way to align words?
bearophileHUGS at lycos.com
bearophileHUGS at lycos.com
Thu Nov 30 18:49:14 EST 2006
Robert R.:
> i would like to write a piece of code to help me to align some sequence
> of words and suggest me the ordered common subwords of them [...]
> a trouble i have if when having many different strings my results tend
> to be nothing while i still would like to have one of the, or maybe,
> all the best matches.
This is my first solution try, surely there are faster, shorter, better
solutions...
from collections import defaultdict
from itertools import chain
from graph import Graph
# http://sourceforge.net/projects/pynetwork/
def commonOrdered(*strings):
lists = [[w for w in string.lower().split() if w.isalpha()] for
string in strings]
freqs = defaultdict(int)
for w in chain(*lists):
freqs[w] += 1
g = Graph()
for words in lists:
g.addPath(words)
len_strings = len(strings)
return [w for w in g.toposort() if freqs[w]==len_strings]
s0 = "this is an example of a thing i would like to have"
s1 = "another example of something else i would like to have"
s2 = 'and this is another " example " but of something ; now i would
still like to have'
print commonOrdered(s0, s1, s2)
It creates a graph with the paths of words, then sorts the graph
topologically, then takes only the words of the sorting that are
present in all the original strings.
With a bit of work the code can be used if it contains words like
"example" instead of " example ".
An xtoposort method too can be added to the Graph class...
Bye,
bearophile
More information about the Python-list
mailing list