sorted unique elements from a list; using 2.3 features

Andrew Dalke adalke at mindspring.com
Mon Jan 6 05:23:48 EST 2003


Delaney, Timothy wrote:
> Using sets is definitely the Right Way (TM) to do it. This is one of the
> primary use cases for sets (*everyone* wants to do this).

   - the performance of Sets is slower than that of a simple dict
       (because, after all, Sets are built on top of a dict but with
        extra overhead).  I just tested it -- fromdict is about 20%
       faster than using Set

 >>> import time, sets, random
 >>> data = [random.randrange(1000000) for i in range(2000000)]
 >>> def do_set():
...   return len(sets.Set(data))
...
 >>> def do_dict():
...   return len(dict.fromkeys(data).keys())
...
 >>> t1=time.clock();do_set();t2=time.clock()
865149
 >>> t2-t1
2.9100000000000001
 >>> t1=time.clock();do_dict();t2=time.clock()
865149
 >>> t2-t1
2.3299999999999983
 >>> 2.33/2.9
0.80344827586206902
 >>>


   - there's the extra import, which is a bit tedious if you don't
        need the power of a Set

   - using dicts is a basic part of using Python, so the step to using
        a different way to construct a dict is easier than thinking
        about using a different class


>>(The 'list()' is needed because that's the only way to get elements
>>out from a list.  It provides an __iter__ but no 'tolist()' method.)
> 
> 
> And this is the canonical way to transform any iterable to a list. Why
> should every class that you want to transform to a list have to supply a
> `tolist` method? Why not a `totuple` method?

I put that there as a reminder for fogies like me who even now have
spent more time on pre-2.x version of Python than post-2.x versions.
When I started back in the 1.3 days, there were modules like 'array',
which *did* have a 'tolist' method, and that was the proper way to
do it.

 >>> import array
 >>> x=array.array("c", "AndreW")
array('c', 'AndreW')
 >>> x.tolist()
['A', 'n', 'd', 'r', 'e', 'W']
 >>>

The implication that there should be one was not my intention, though
my wording in that regard was unfortunate.

This is also a case where it isn't obvious how to get data from a
container.  Every other container spells it through [] or through
a method name which *doesn't* start with a "_".  So people just
starting with a Set might not know what to look for.

It would be nice if the example code showed iterating data from
a Set...


>>The other is with the new 'fromkeys' class, which constructs
> 
> 
> Actually, dictionary class (static?) method.

Yep.  Meant to say "class method".  Just didn't get through my
fingers.

> This, whilst slightly shorter (due to no import - which in future versions
> will be going away anyway), is definitely *not* the Right Way (TM) to do it.
> It is likely to confuse people.

It will?  Given how much pre-2.3 code uses the "build a dict then
get the keys" to get the unique values in a data set, it's an idiom
that any intermediate Python programmer should understand and expect
to understand.

As for beginning Python programmers, I can't put myself into their
shoes.

My feeling for now is that I'll use "Set" when I want to do set
manipulations, like

   set1 = { identifiers matching query 1}
   set2 = { identifiers matching query 2}
   total = set1 + set2

and not use it for getting unique values.


					Andrew
					dalke at dalkescientific.com





More information about the Python-list mailing list