[Patches] [ python-Patches-460402 ] PEP 270: uniq() method for list objects

Thu, 13 Sep 2001 08:34:49 -0700

Patches item #460402, was opened at 2001-09-10 11:48
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=460402&group_id=5470

Category: None
Group: None
Status: Open
Resolution: None
Priority: 5
Submitted By: Jason Petrone (jpetrone)
Assigned to: Nobody/Anonymous (nobody)
Summary: PEP 270: uniq() method for list objects

Initial Comment:
Remove duplicate elements from a list object.

3 Patches:

uniq.doc.patch - Documentation changes in 
		 libstdtypes.tex

uniq.ordered.patch - Changes to listmodule.c for brute
        force duplicate removal with uniq(). 
        Maintains list order.    

uniq.unordered.patch - Changes to listmodule.c for
        duplicate removal with uniq() First tries 
        using a mapping object. If the list elements
        aren't mapable,  tries sorted removal.  If 
        the list isn't sortable, finally falls back
        on brute force removal.  Does not maintain
        list order.

----------------------------------------------------------------------

>Comment By: Guido van Rossum (gvanrossum)
Date: 2001-09-13 08:34

Message:
Logged In: YES 
user_id=6380

I've always assumed that the reason uniq(1) only removed
consecutive duplicates would be so that it doesn't have to
use a O(N**2) algorithm where an O(N log N) algorithm will
do. If you need all duplicates removed, you can do it as
follows:

  L.sort()
  L.uniq()

In many cases, you can ensure uniqueness by simply choosing
the right data structures. The keys of a dictionary can be
used to represent a set; there's also talk of adding a set
data type (see PEP 218).

I still haven't seen a good motivation for this addition. If
you don't post it here, I'll reject this.

----------------------------------------------------------------------

Comment By: Jason Petrone (jpetrone)
Date: 2001-09-13 07:54

Message:
Logged In: YES 
user_id=71210

I've always assumed the unix uniq(1) program only removes
consecutive duplicates so it may operate on very large
files without buffering them entirely into memory.  

In Python, buffering the data in memory isn't an issue 
since it is already there.  Also, in most cases a hash
table can be used instead of sorting for slightly better
performance than pre-sorting and removing consecutive 
duplicates.

My main reason for not wanting to mimic unix uniq's
functionality is that I've never really been in a 
situation where I've only needed consecutive duplicates 
removed.  I think achieving global uniqueness in a list is 
a much more common task.

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2001-09-12 14:18

Message:
Logged In: YES 
user_id=6380

Hm.  The unix uniq(1) program only removes *consecutive*
duplicates. That's an O(N) algorithm and only requires
equality to be defined, and leaves the sorting to the
caller. Would it make sense to support only that?

I guess I'm looking for a use case here...

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=460402&group_id=5470