[Tutor] List intersect

Danny Yoo dyoo at hkn.eecs.berkeley.edu
Mon Sep 27 20:52:56 CEST 2004



> > I never wrote a python script. I am looking for a script that I can use
> > to generate a uniq list out of five lists. These lists are bunch of
> > usernames.

[text cut]

> Unless I'm mistaken about what you're trying to do, writing a Python
> script sounds like making things harder than necessary. Assuming you're
> on a Unix-like system and your lists are in separate files:
>
> cat list1 list2 list3 list4 list5 | uniq > newlist


Yes, the Unix shell solution should be really straightforward.  But don't
forget to sort!

    $ cat list1 list2 list3 list4 list5 | sort | uniq > newlist


The 'uniquing' algorithm that uniq uses won't see duplicates unless
they're adjacent to each other.  The 'uniq' utility does something like
this:

###
def unique(sequence):
    if len(sequence) == 1:
        return sequence
    results = [sequence[0]]
    i = 1
    while i < len(sequence):
        if sequence[i] != sequence[i-1]:
            results.append(sequence[i])
        i += 1
    return results
###

And we can see that it works, just as long as the sequence is sorted:

###
>>> unique([1, 1, 2, 4, 6, 8, 9, 9, 10])
[1, 2, 4, 6, 8, 9, 10]
###


But if the elements are not in sorted order, then unique() won't catch all
duplicate elements:

###
>>> import random
>>> l = [1, 1, 2, 4, 6, 8, 9, 9, 10]
>>> random.shuffle(l)
>>> l
[4, 6, 8, 1, 10, 1, 9, 9, 2]
>>> unique(l)
[4, 6, 8, 1, 10, 1, 9, 2]
###

So if you use this approach, don't forget to sort first.



An alternative way to solve the uniqueness problem is to use dictionaries
or sets to maintain a unique list of elements.  All of the tutorials on:

    http://www.python.org/topics/learn/non-prog.html

should cover how to use dictionaries.



More information about the Tutor mailing list