Merging ordered lists

etal eric.talevich at gmail.com
Mon Jun 2 18:02:49 EDT 2008


On Jun 1, 1:49 am, Peter Otten <__pete... at web.de> wrote:
> Peter Otten wrote:
> > #untested
>
> Already found two major blunders :(
>
> # still untested
> import difflib
>
> def _merge(a, b):
>     sm = difflib.SequenceMatcher(None, a, b)
>     for op, a1, a2, b1, b2 in sm.get_opcodes():
>         if op == "insert":
>             yield b[b1:b2]
>         elif op == "replace":
>             yield a[a1:a2]
>             yield b[b1:b2]
>         else: # delete, equal
>             yield a[a1:a2]
>
> def merge(a, b):
>     return sum(_merge(a, b), [])
>
> def merge_to_unique(sources):
>     return unique(reduce(merge, sorted(sources, key=len, reverse=True)))
>

difflib.SequenceMatcher looks promising; I'll try it. Thanks!


> def unique(items):
>     u = set(items)
>     if len(u) == len(items):
>         return items
>     result = []
>     for item in items:
>         if item in u:
>             result.append(item)
>             u.remove(item)
>     return result

You did right by preserving the original (non-alphabetical) ordering,
but I'm less enthusiastic about the shape of this function. My
original function used 7 lines of code, and only 1 for the unique()
step. This uses up to three container objects. Is it really an
improvement?

(Secret: the reference list (or, any of the sources) is unlikely to be
more than a few dozen elements long. The data set that puts
merge_to_unique through a workout will be a giant list of
comparatively short lists, so the unique() part just needs to be short
and conceptually clean, while merge() should attempt sane behavior for
large len(sources).)



More information about the Python-list mailing list