[Catalog-sig] Questions about efficiency.

Mike Olson Mike.Olson@fourthought.com
Mon, 28 May 2001 14:38:22 -0600


Luis Leonel Lopez wrote:
> 
> Dear friends,
> 
> I wrote two functions which receive a sequence and return a list with
> non-duplicate elements. As per "unique" function's Tim Peters (as is in
> Python Cookbook:




Here is the unique function I wrote.  I haven't really analized it much
but it seems to be pretty fast.  Atleast it is faster then the for loop
approaches for large sets

def Unique(left):
    return reduce(lambda rt,x:x in rt and rt or rt + [x],left,[])
 

Mike


> 
> def unique(s):
>     """Return a list of the elements in s, but without duplicates.
> 
>     For example, unique([1,2,3,1,2,3]) is some permutation of [1,2,3],
>     unique("abcabc") some permutation of ["a", "b", "c"], and
>     unique(([1, 2], [2, 3], [1, 2])) some permutation of
>     [[2, 3], [1, 2]].
> 
>     For best speed, all sequence elements should be hashable.  Then
>     unique() will usually work in linear time.
> 
>     If not possible, the sequence elements should enjoy a total
>     ordering, and if list(s).sort() doesn't raise TypeError it's
>     assumed that they do enjoy a total ordering.  Then unique() will
>     usually work in O(N*log2(N)) time.
> 
>     If that's not possible either, the sequence elements must support
>     equality-testing.  Then unique() will usually work in quadratic
>     time.
>     """
> 
>     n = len(s)
>     if n == 0:
>         return []
> 
>     # Try using a dict first, as that's the fastest and will usually
>     # work.  If it doesn't work, it will usually fail quickly, so it
>     # usually doesn't cost much to *try* it.  It requires that all the
>     # sequence elements be hashable, and support equality comparison.
>     u = {}
>     try:
>         for x in s:
>             u[x] = 1
>     except TypeError:
>         del u  # move on to the next method
>     else:
>         return u.keys()
> 
>     # We can't hash all the elements.  Second fastest is to sort,
>     # which brings the equal elements together; then duplicates are
>     # easy to weed out in a single pass.
>     # NOTE:  Python's list.sort() was designed to be efficient in the
>     # presence of many duplicate elements.  This isn't true of all
>     # sort functions in all languages or libraries, so this approach
>     # is more effective in Python than it may be elsewhere.
>     try:
>         t = list(s)
>         t.sort()
>     except TypeError:
>         del t  # move on to the next method
>     else:
>         assert n > 0
>         last = t[0]
>         lasti = i = 1
>         while i < n:
>             if t[i] != last:
>                 t[lasti] = last = t[i]
>                 lasti += 1
>             i += 1
>         return t[:lasti]
> 
>     # Brute force is all that's left.
>     u = []
>     for x in s:
>         if x not in u:
>             u.append(x)
>     return u
> 
> ), my functions use the slowest method by brute force. I'd want to know why
> and which of my functions is better or efficient and why.
> 
> def onlyOne1(s)
>         r = []
>         for x in s:
>                 if x not in r:
>                         r.append(x)
>         return r
> 
> def onlyOne2(s):
>         r = []
>         for x in s:
>                 try:
>                         r.index(x)
>                 except:
>                         r.append(x)
>         return r
> 
> Thank you in advance!
> 
> Luis Leonel Lopez
> _________________________________________________________________________
> Get Your Private, Free E-mail from MSN Hotmail at http://www.hotmail.com.
> 
> _______________________________________________
> Catalog-sig mailing list
> Catalog-sig@python.org
> http://mail.python.org/mailman/listinfo/catalog-sig

-- 
Mike Olson				 Principal Consultant
mike.olson@fourthought.com               (303)583-9900 x 102
Fourthought, Inc.                         http://Fourthought.com 
Software-engineering, knowledge-management, XML, CORBA, Linux, Python