Merging ordered lists
Raymond Hettinger
python at rcn.com
Tue Jun 3 02:08:06 EDT 2008
> Thanks for the tip; itertools never ceases to amaze. One issue:
> groupby doesn't seem to remove all duplicates, just consecutive ones
> (for lists of strings and integers, at least):
>
> >>> [k for k, g in itertools.groupby(list("asdfdfffdf"))]
>
> ['a', 's', 'd', 'f', 'd', 'f', 'd', 'f']
That's why the example ran imerge() first. That takes several sorted
inputs and gives a single sorted output. Then, groupby() eliminates
the dups from the sorted input where the equal valued entries have
been brought together.
> Another issue: dropping everything into a heap and pulling it back out
> looks like it loses the original ordering, which isn't necessarily
> alphabetical, but "however the user wants to organize the
> spreadsheet". That's why I originally avoided using
> sorted(set(itertools.chain(*sources))). Do you see another way around
> this?
If your inputs were already sorted, then the above algorithms maintain
that order and eliminate duplicates.
If the inputs were not sorted, then I don't think you have a precise
idea of what it means to merge them while preserving order. For
example if the inputs are XYZPDQ and bYlmPz, then what does a merged
sequence look like once the Y and P duplicates are removed? Is it
XZPDQblmz or some intermix of the two like XbZlPmDzQ? If it is the
first of those, then the answer is simple:
seen = set()
for elem in chain(*sources):
if elem in seen:
continue
yield elem
seen.add(elem)
Raymond
More information about the Python-list
mailing list