Merging ordered lists

etal eric.talevich at gmail.com
Sun Jun 1 01:00:19 EDT 2008


Here's an algorithm question: How should I efficiently merge a
collection of mostly similar lists, with different lengths and
arbitrary contents, while eliminating duplicates and preserving order
as much as possible?

My code:

def merge_to_unique(sources):
    """Merge the unique elements from each list in sources into new
list.

    Using the longest input list as a reference, merges in the
elements from
    each of the smaller or equal-length lists, and removes duplicates.

    @return: Combined list of elements.
    """
    sources.sort(None, len, True)    # Descending length
    ref = sources[0]
    for src in sources[1:]:
        for i, s in enumerate(src):
            if s and (ref[i] != s) and s not in ref:
                ref.insert(ref.index(src[i-1])+1, s)
    # Remove duplicates
    return [r for i, r in enumerate(ref) if r and r not in ref[i+1:]]


This comes up with using the CSV module's DictWriter class to merge a
set (list, here) of not-quite-perfect CSV sources. The DictWriter
constructor needs a list of field names so that it can convert
dictionaries into rows of the CSV file it writes. Some of the input
CSV files are missing columns, some might have extras -- all of this
should be accepted, and the order of the columns in the merged file
should match the order of the input files as much as possible (not
alphabetical). All of the list elements are strings, in this case, but
it would be nice if the function didn't require it.

Speed actually isn't a problem yet; it might matter some day, but for
now it's just an issue of conceptual aesthetics. Any suggestions?



More information about the Python-list mailing list