[Tutor] Ways of removing consequtive duplicates from a list

avi.e.gross at gmail.com avi.e.gross at gmail.com
Sun Jul 17 12:59:15 EDT 2022


You could make the case, Peter, that you can use anything as a start that
will not likely match in your domain. You are correct if an empty string may
be in the data. 

Now an object returned by object is pretty esoteric and ought to be rare and
indeed each new object seems to be individual.

val=object()

[(val := ele) for ele in [1,1,2,object(),3,3,3] if ele != val]
-->> [1, 2, <object object at 0x00000176F33150D0>, 3]

So the only way to trip this up is to use the same object or another
reference to it where it is silently ignored.

[(val := ele) for ele in [1,1,2,val,3,3,3] if ele != val]
-->> [1, 2, 3]

valiant = val
[(val := ele) for ele in [1,1,2,valiant,3,3,3] if ele != val]
-->> [1, 2, 3]

But just about any out-of-band will presumably do. I mean if you are
comparing just numbers, all you need do is slip in something else like "The
quick brown fox jumped over the lazy dog" or float("inf") or even val =
(math.inf, -math.inf) and so on.

I would have thought also of using the special value of None and it works
fine unless the string has a None!

So what I see here is a choice between a heuristic solution that can fail to
work quite right on a perhaps obscure edge case, or a fully deterministic
algorithm that knows which is the first and treats it special.

The question asked was about efficiency, so let me ask a dumb question.

Is there a difference in efficiency of comparing to different things over
and over again in the loop? I would think so. Comparing to None could turn
out to be trivial. Math.inf as implemented in python seems to just be a big
floating number as is float("inf") and I have no idea what an object() looks
like but assume it is the parent class of all other objects ad thus has no
content but some odd methods attached.

Clearly the simplest comparison might be variable depending on what the data
you are working on is.

So, yes, an unconditional way of dealing with the first item often is
needed. It is very common in many algorithms for the first and perhaps last
item to have no neighbor on one side.

-----Original Message-----
From: Tutor <tutor-bounces+avi.e.gross=gmail.com at python.org> On Behalf Of
Peter Otten
Sent: Sunday, July 17, 2022 4:27 AM
To: tutor at python.org
Subject: Re: [Tutor] Ways of removing consequtive duplicates from a list

On 17/07/2022 00:01, Alex Kleider wrote:

> PS My (at least for me easier to comprehend) solution:
>
> def rm_duplicates(iterable):
>      last = ''
>      for item in iterable:
>          if item != last:
>              yield item
>              last = item

The problem with this is the choice of the initial value for 'last':

 >>> list(rm_duplicates(["", "", 42, "a", "a", ""]))
[42, 'a', '']   # oops, we lost the initial empty string

Manprit avoided that in his similar solution by using a special value that
will compare false except in pathological cases:

 > val = object()
 > [(val := ele) for ele in lst if ele != val]

Another fix is to yield the first item unconditionally:

def rm_duplicates(iterable):
     it = iter(iterable)
     try:
         last = next(it)
     except StopIteration:
         return
     yield last
     for item in it:
         if item != last:
             yield item
             last = item

If you think that this doesn't look very elegant you may join me in the
https://peps.python.org/pep-0479/ haters' club ;)
_______________________________________________
Tutor maillist  -  Tutor at python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor



More information about the Tutor mailing list