Merging ordered lists

etal eric.talevich at gmail.com
Mon Jun 2 19:04:33 EDT 2008


On Jun 1, 12:34 am, Raymond Hettinger <pyt... at rcn.com> wrote:
>
> I would do it two steps.  There's a number of ways to merge depending
> on whether everything is pulled into memory or not:http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/491285http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/305269
>
> After merging, the groupby itertool is good for removing duplicates:
>
>    result = [k for k, g in groupby(imerge(*sources))]
>
> Raymond

Thanks for the tip; itertools never ceases to amaze. One issue:
groupby doesn't seem to remove all duplicates, just consecutive ones
(for lists of strings and integers, at least):

>>> [k for k, g in itertools.groupby(list("asdfdfffdf"))]
['a', 's', 'd', 'f', 'd', 'f', 'd', 'f']


Another issue: dropping everything into a heap and pulling it back out
looks like it loses the original ordering, which isn't necessarily
alphabetical, but "however the user wants to organize the
spreadsheet". That's why I originally avoided using
sorted(set(itertools.chain(*sources))). Do you see another way around
this?



More information about the Python-list mailing list