sorted unique elements from a list; using 2.3 features

Andrew Dalke adalke at mindspring.com
Sat Jan 4 14:26:18 EST 2003


I often need to get the subset of unique element from a list, in
sorted order.  For example, if I have
    [1, 4, 3, 4, 3, 1]
I want to get
    [1, 3, 4]

The usual way to do this is

# Let 'data' be a list or iterable object
# For example, data = [1, 4, 3, 4, 3, 1]
#          or, data = sys.stdin
d = {}
for x in data:
   d[x] = 1
subset = d.keys()
subset.sort()
# Use 'subset' as needed

Python 2.3 offers at least two new ways to do this.  The first is
with the new 'Set' class

# Let 'data' be a list or iterable object
import sets
subset = list(sets.Set(data))
subset.sort()
# Use 'subset' as needed

(The 'list()' is needed because that's the only way to get elements
out from a list.  It provides an __iter__ but no 'tolist()' method.)


The other is with the new 'fromkeys' class, which constructs
a dictionary from a list -- the elements of the list become the
keys and you can choose the value for the item, or just use the
default of None.  To show you what that means

 >>> dict.fromkeys([1,2,5,3,2,1], 0)
{1: 0, 2: 0, 3: 0, 5: 0}
 >>> dict.fromkeys([1,2,5,3,2,1])
{1: None, 2: None, 3: None, 5: None}
 >>>

So for the task at hand,

# Let 'data' be a list or iterable object
subset = dict.fromkeys(data).keys()
subset.sort()
# Use 'subset' as needed


For a real-life example, suppose you want to get unique lines
from the stdin input stream, sort them, and dump the results
to stdout.  Here's how to do it in Python 2.3

import sys
unique_lines = dict.fromkeys(sys.stdin).keys()
unique_lines.sort()
sys.stdout.writelines(unique_lines)

					Andrew
					dalke at dalkescientific.com





More information about the Python-list mailing list