[Patches] [ python-Patches-460402 ] PEP 270: uniq() method for list objects
noreply@sourceforge.net
noreply@sourceforge.net
Thu, 13 Sep 2001 08:34:49 -0700
Patches item #460402, was opened at 2001-09-10 11:48
You can respond by visiting:
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=460402&group_id=5470
Category: None
Group: None
Status: Open
Resolution: None
Priority: 5
Submitted By: Jason Petrone (jpetrone)
Assigned to: Nobody/Anonymous (nobody)
Summary: PEP 270: uniq() method for list objects
Initial Comment:
Remove duplicate elements from a list object.
3 Patches:
uniq.doc.patch - Documentation changes in
libstdtypes.tex
uniq.ordered.patch - Changes to listmodule.c for brute
force duplicate removal with uniq().
Maintains list order.
uniq.unordered.patch - Changes to listmodule.c for
duplicate removal with uniq() First tries
using a mapping object. If the list elements
aren't mapable, tries sorted removal. If
the list isn't sortable, finally falls back
on brute force removal. Does not maintain
list order.
----------------------------------------------------------------------
>Comment By: Guido van Rossum (gvanrossum)
Date: 2001-09-13 08:34
Message:
Logged In: YES
user_id=6380
I've always assumed that the reason uniq(1) only removed
consecutive duplicates would be so that it doesn't have to
use a O(N**2) algorithm where an O(N log N) algorithm will
do. If you need all duplicates removed, you can do it as
follows:
L.sort()
L.uniq()
In many cases, you can ensure uniqueness by simply choosing
the right data structures. The keys of a dictionary can be
used to represent a set; there's also talk of adding a set
data type (see PEP 218).
I still haven't seen a good motivation for this addition. If
you don't post it here, I'll reject this.
----------------------------------------------------------------------
Comment By: Jason Petrone (jpetrone)
Date: 2001-09-13 07:54
Message:
Logged In: YES
user_id=71210
I've always assumed the unix uniq(1) program only removes
consecutive duplicates so it may operate on very large
files without buffering them entirely into memory.
In Python, buffering the data in memory isn't an issue
since it is already there. Also, in most cases a hash
table can be used instead of sorting for slightly better
performance than pre-sorting and removing consecutive
duplicates.
My main reason for not wanting to mimic unix uniq's
functionality is that I've never really been in a
situation where I've only needed consecutive duplicates
removed. I think achieving global uniqueness in a list is
a much more common task.
----------------------------------------------------------------------
Comment By: Guido van Rossum (gvanrossum)
Date: 2001-09-12 14:18
Message:
Logged In: YES
user_id=6380
Hm. The unix uniq(1) program only removes *consecutive*
duplicates. That's an O(N) algorithm and only requires
equality to be defined, and leaves the sorting to the
caller. Would it make sense to support only that?
I guess I'm looking for a use case here...
----------------------------------------------------------------------
You can respond by visiting:
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=460402&group_id=5470