sorted unique elements from a list; using 2.3 features
Andrew Dalke
adalke at mindspring.com
Mon Jan 6 05:23:48 EST 2003
Delaney, Timothy wrote:
> Using sets is definitely the Right Way (TM) to do it. This is one of the
> primary use cases for sets (*everyone* wants to do this).
- the performance of Sets is slower than that of a simple dict
(because, after all, Sets are built on top of a dict but with
extra overhead). I just tested it -- fromdict is about 20%
faster than using Set
>>> import time, sets, random
>>> data = [random.randrange(1000000) for i in range(2000000)]
>>> def do_set():
... return len(sets.Set(data))
...
>>> def do_dict():
... return len(dict.fromkeys(data).keys())
...
>>> t1=time.clock();do_set();t2=time.clock()
865149
>>> t2-t1
2.9100000000000001
>>> t1=time.clock();do_dict();t2=time.clock()
865149
>>> t2-t1
2.3299999999999983
>>> 2.33/2.9
0.80344827586206902
>>>
- there's the extra import, which is a bit tedious if you don't
need the power of a Set
- using dicts is a basic part of using Python, so the step to using
a different way to construct a dict is easier than thinking
about using a different class
>>(The 'list()' is needed because that's the only way to get elements
>>out from a list. It provides an __iter__ but no 'tolist()' method.)
>
>
> And this is the canonical way to transform any iterable to a list. Why
> should every class that you want to transform to a list have to supply a
> `tolist` method? Why not a `totuple` method?
I put that there as a reminder for fogies like me who even now have
spent more time on pre-2.x version of Python than post-2.x versions.
When I started back in the 1.3 days, there were modules like 'array',
which *did* have a 'tolist' method, and that was the proper way to
do it.
>>> import array
>>> x=array.array("c", "AndreW")
array('c', 'AndreW')
>>> x.tolist()
['A', 'n', 'd', 'r', 'e', 'W']
>>>
The implication that there should be one was not my intention, though
my wording in that regard was unfortunate.
This is also a case where it isn't obvious how to get data from a
container. Every other container spells it through [] or through
a method name which *doesn't* start with a "_". So people just
starting with a Set might not know what to look for.
It would be nice if the example code showed iterating data from
a Set...
>>The other is with the new 'fromkeys' class, which constructs
>
>
> Actually, dictionary class (static?) method.
Yep. Meant to say "class method". Just didn't get through my
fingers.
> This, whilst slightly shorter (due to no import - which in future versions
> will be going away anyway), is definitely *not* the Right Way (TM) to do it.
> It is likely to confuse people.
It will? Given how much pre-2.3 code uses the "build a dict then
get the keys" to get the unique values in a data set, it's an idiom
that any intermediate Python programmer should understand and expect
to understand.
As for beginning Python programmers, I can't put myself into their
shoes.
My feeling for now is that I'll use "Set" when I want to do set
manipulations, like
set1 = { identifiers matching query 1}
set2 = { identifiers matching query 2}
total = set1 + set2
and not use it for getting unique values.
Andrew
dalke at dalkescientific.com
More information about the Python-list
mailing list