[Baypiggies] hello and returning unique elements

Fri Apr 8 19:49:23 CEST 2011

By some quick experiments, it looks like you can double the speed from
your version by not doing all the reversed/sorted stuff and just keeping
track of it in a much less exciting way:

def unique2(seq):
  seen = set()
  out = []
  for i in seq:
    if i not in seen:
      out.append(i)
      seen.add(i)
  return out

# yours
$ python -m timeit -s 'from demo import unique' 'sum(unique([1, 1, 2, 3,
4]))'
100000 loops, best of 3: 5.07 usec per loop
# mine
$ python -m timeit -s 'from demo import unique2' 'sum(unique2([1, 1, 2, 3,
4]))'
100000 loops, best of 3: 2.65 usec per loop

You can actually make it a smidge faster still by making it a generator,
for at least this tiny case (as it becomes bigger, into the thousands of
items, it's much more like unique2's timings):

def unique3(seq):
  seen = set()
  for i in seq:
    if i not in seen:
      yield i
      seen.add(i)

$ python -m timeit -s 'from demo import unique3' 'sum(unique3([1, 1, 2, 3,
4]))'
1000000 loops, best of 3: 1.99 usec per loop

Tim

> Erratum: the function I suggested is incomplete without this line
> following the docstring.
>
> if seq is None: return None
>
>
> On Thu, Apr 7, 2011 at 6:42 PM, Hy Carrinski <hcarrinski at gmail.com> wrote:
>> Dear BayPIGgies,
>>
>> Thank you for providing exceptional reading material over the past
>> year for an avid conversation follower and occasional meeting
>> attendee.
>>
>> I would like to join the conversation and contribute to the community
>> beginning with this post. I will attend our meeting this month, and
>> look forward to meeting at Noisebridge or elsewhere to watch PyCon
>> 2011 videos together.
>>
>> While thinking about unique lists in the context of Vikram's recent
>> question, I came across a related blog post at
>> http://www.peterbe.com/plog/uniqifiers-benchmark
>>
>> The post emphasizes speed and maintaining original list order when
>> removing duplicates from a list.
>>
>> Although the post is several years old, I thought of a solution to the
>> question it poses that is different from the ones proposed by
>> BayPIGgies either in the blog's comments or at a previous discussion
>> (before reversed, sorted, and set were introduced in Python 2.4) at
>> http://code.activestate.com/recipes/52560/
>>
>> I will be grateful for any comments on my ask forgiveness rather than
>> permission implementation. It requires that all elements of the list
>> be hashable and works for Python 2.x where x >= 4.
>>
>> def unique(seq):
>> Â  Â """Return unique list of hashable objects while preserving
>> order."""
>> Â  Â seen = {}
>> Â  Â for ix in reversed(xrange(len(seq))):
>> Â  Â  Â  Â seen[seq[ix]] = ix
>> Â  Â return sorted(seen,key=seen.__getitem__)
>>
>> For the case of hashable objects are there disadvantages to this
>> implementation versus the ones referenced above? Is there a reason to
>> prefer operator.itemgetter over __getitem__ in this limited context?
>> Is there an advantage in clarity or performance to using
>> collections.OrderedDict in Python 2.7 and above? For lists containing
>> a great number of distinct objects the sort step will introduce a
>> lower bound on speed for this implementation.
>>
>> I look forward to meeting more BayPIGgies in the coming months and
>> years.
>>
>> Cheers,
>> Hy
>>
> _______________________________________________
> Baypiggies mailing list
> Baypiggies at python.org
> To change your subscription options or unsubscribe:
> http://mail.python.org/mailman/listinfo/baypiggies