sorted unique elements from a list; using 2.3 features

Steve Holden sholden at holdenweb.com
Mon Jan 6 15:07:41 EST 2003


"Andrew Dalke" <adalke at mindspring.com> wrote in message
news:av7cuu$cu2$1 at slb0.atl.mindspring.net...
> I often need to get the subset of unique element from a list, in
> sorted order.  For example, if I have
>     [1, 4, 3, 4, 3, 1]
> I want to get
>     [1, 3, 4]
>
> The usual way to do this is
>
> # Let 'data' be a list or iterable object
> # For example, data = [1, 4, 3, 4, 3, 1]
> #          or, data = sys.stdin
> d = {}
> for x in data:
>    d[x] = 1
> subset = d.keys()
> subset.sort()
> # Use 'subset' as needed
>
> Python 2.3 offers at least two new ways to do this.  The first is
> with the new 'Set' class
>
> # Let 'data' be a list or iterable object
> import sets
> subset = list(sets.Set(data))
> subset.sort()
> # Use 'subset' as needed
>
> (The 'list()' is needed because that's the only way to get elements
> out from a list.  It provides an __iter__ but no 'tolist()' method.)
>
Presumably (I don't have 2.3 to hand, so I may get shot down in flames)
since a set has __iter__() one could also use

    subset = [x for x in sets.Set(data)]

to create the list you want to sort? I'm not convinced this is a win,
though, so timings might be interesting.

>
> The other is with the new 'fromkeys' class, which constructs
> a dictionary from a list -- the elements of the list become the
> keys and you can choose the value for the item, or just use the
> default of None.  To show you what that means
>
>  >>> dict.fromkeys([1,2,5,3,2,1], 0)
> {1: 0, 2: 0, 3: 0, 5: 0}
>  >>> dict.fromkeys([1,2,5,3,2,1])
> {1: None, 2: None, 3: None, 5: None}
>  >>>
>
> So for the task at hand,
>
> # Let 'data' be a list or iterable object
> subset = dict.fromkeys(data).keys()
> subset.sort()
> # Use 'subset' as needed
>
>
> For a real-life example, suppose you want to get unique lines
> from the stdin input stream, sort them, and dump the results
> to stdout.  Here's how to do it in Python 2.3
>
> import sys
> unique_lines = dict.fromkeys(sys.stdin).keys()
> unique_lines.sort()
> sys.stdout.writelines(unique_lines)
>
Of course, becuase your specification contains "in sorted order" you must
apply the sort() function at some stage, but the more interesting aspect of
this post is the new ways to get at the unique elements.

When sorting isn't required, of course, it should be perfectly possible to
simply write:

    for element in sets.Set(data):
        do_stuff_with(element)

and no list conversion is necessary since the iterator does what it's
supposed to (i.e.: support iteration :-)

regards
-----------------------------------------------------------------------
Steve Holden                                  http://www.holdenweb.com/
Python Web Programming                 http://pydish.holdenweb.com/pwp/
Bring your musical instrument to PyCon!    http://www.python.org/pycon/
-----------------------------------------------------------------------







More information about the Python-list mailing list