[Tutor] Ways of removing consequtive duplicates from a list

avi.e.gross at gmail.com avi.e.gross at gmail.com
Mon Jul 18 13:42:54 EDT 2022


Peter,

I studied Pathology in school but we used human bodies rather than the
pythons you are abusing.

The discussion we are having is almost as esoteric and has to do with
theories of computation and what algorithms are in some sense provable and
which are probabilistic to the point of them failing being very rare and
which are more heuristic and tend to work and perhaps get a solution that is
close enough to optimal and which ones never terminate and so on.

My point was that I played with your idea and was convinced it should work
as long as you only create the object once and never copy it or in any way
include it in the list or iterable. That seems very doable.

But your new comment opens up another door. 

Turn your class A around:

class A:
     def __eq__(self, other): return True
     def __ne__(self, other): return False

make it:

class all_alone:
     def __eq__(self, other): return False
     def __ne__(self, other): return True

If you made an object of that class, it won't even report being equal to
itself!

That is very slightly better but not really an important distinction. But
what happens when A is compared to all_alone may depend on which is first.
Worth a try? You can always flip the order of the comparison as needed.

I do note back in my UNIX days, we often needed a guaranteed unique ID as in
a temporary filename and often used a process ID deemed to be unique. But
processed come and go and eventually that process ID is re-used and odd
things can happen if it find files already there or ...

-----Original Message-----
From: Tutor <tutor-bounces+avi.e.gross=gmail.com at python.org> On Behalf Of
Peter Otten
Sent: Monday, July 18, 2022 3:15 AM
To: tutor at python.org
Subject: Re: [Tutor] Ways of removing consequtive duplicates from a list

On 17/07/2022 18:59, avi.e.gross at gmail.com wrote:
> You could make the case, Peter, that you can use anything as a start 
> that will not likely match in your domain. You are correct if an empty 
> string may be in the data.
>
> Now an object returned by object is pretty esoteric and ought to be 
> rare and indeed each new object seems to be individual.
>
> val=object()
>
> [(val := ele) for ele in [1,1,2,object(),3,3,3] if ele != val]
> -->> [1, 2, <object object at 0x00000176F33150D0>, 3]
>
> So the only way to trip this up is to use the same object or another 
> reference to it where it is silently ignored.

When you want a general solution for removal of consecutive duplicates you
can put the line

val = object()

into the deduplication function which makes it *very* unlikely that val will
also be passed as an argument to that function.

To quote myself:

> Manprit avoided that in his similar solution by using a special value 
> that will compare false except in pathological cases:
>
>> val = object()
>> [(val := ele) for ele in lst if ele != val]

What did I mean with "pathological"?

One problematic case would be an object that compares equal to everything,

class A:
     def __eq__(self, other): return True
     def __ne__(self, other): return False

but that is likely to break the algorithm anyway.

Another problematic case: objects that only implement comparison for other
objects of the same type. For these deduplication will work if you avoid the
out-of-band value:

 >>> class A:
	def __init__(self, name):
		self.name = name
	def __eq__(self, other): return self.name == other.name
	def __ne__(self, other): return self.name != other.name
	def __repr__(self): return f"A(name={self.name})"


 >>> prev = object()
 >>>
 >>> [(prev:=item) for item in map(A, "abc") if item != prev] Traceback
(most recent call last):
   File "<pyshell#57>", line 1, in <module>
     [(prev:=item) for item in map(A, "abc") if item != prev]
   File "<pyshell#57>", line 1, in <listcomp>
     [(prev:=item) for item in map(A, "abc") if item != prev]
   File "<pyshell#54>", line 5, in __ne__
     def __ne__(self, other): return self.name != other.name
AttributeError: 'object' object has no attribute 'name'


 >>> def rm_duplicates(iterable):
     it = iter(iterable)
     try:
         last = next(it)
     except StopIteration:
         return
     yield last
     for item in it:
         if item != last:
             yield item
             last = item

 >>> list(rm_duplicates(map(A, "aabccc"))) [A(name=a), A(name=b), A(name=c)]
>>> _______________________________________________
Tutor maillist  -  Tutor at python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor



More information about the Tutor mailing list