Merging lists has made my brain hurt.

Thorsten Kampe thorsten at thorstenkampe.de
Sun Oct 6 12:03:11 EDT 2002


* Alex Martelli
> The intersection of two "sets" (Sets per se are only added in Python
> 2.3, but you can use either lists OR dictionaries in Python 2.2 as
> sets -- all you need is iterability and ability to use 'in' to test
> membership) is pretty easy:
> [...]
> dictionaries are far faster than lists for membership-tests and
> for deletion of items.  Whether it's worth building the dicts
> corresponding to your lists depends on how long are your lists:
> try both ways, pick the faster one if it matters.

Treating lists a priori as sets for reasons of speed is definitely not 
the way to go because you're "losing information". I think it's better 
to treat them as tuples (in a mathematical sense) whereby you can 
always delete 'duplicates' afterwards if you don't care about them.

Example: What's the intersection between [2, 1, 2, 3, 2] and
[0, 1, 2, 3, 2, 4]? An intersection (union, symmetric difference) of 
tuples can only be meaningful for the /corresponding/ items of the 
tuples/lists, so it's neither [2, 1, 3] nor [2, 1, 2, 3, 2] but
[2, 1, 2, 3].

I tried to write a reliable (though maybe slow) program that takes 
care of this issue (especially that intersection is _commutative_):

#v+
def boolean(seq0, seq1, boolean_modus):
    """" perform 'and', 'not', 'or', or 'xor' operation on sequences """
    if boolean_modus == 'and':    # intersection
        seq1 = seq1[:]
        intersection = []
        for item in seq0:
            if item in seq1:
                intersection.append(item)
                seq1.remove(item)
        return intersection

    elif boolean_modus == 'not':  # set difference
        setdiff = seq0[:]
        for item in seq1:
            if item in setdiff:
                setdiff.remove(item)
        return setdiff

    elif boolean_modus == 'or':   # union
        return seq0 + boolean(seq1, seq0, 'not')

    elif boolean_modus == 'xor':  # symmetric difference
        return boolean(boolean(seq0, seq1, 'or'), boolean(seq0, seq1, 'and'), 'not')
#v-


Thorsten



More information about the Python-list mailing list