Yet another unique() function...

Paul Rubin http
Wed Feb 28 15:18:16 EST 2007


bearophileHUGS at lycos.com writes:
> It's more terse, but my version is built to be faster in the more
> common cases of all hashable or/and all sortable items (while working
> in other cases too).
> Try your unique on an unicode string, that's probably a bug (keepstr
> is being ignored).
> Version by Paul Rubin is very short, but rather unreadable too.
> 
> Bye,
> bearophile

Unicode fix (untested):

def unique(seq, keepstr=True):
    t = type(seq)
    if t in (unicode, str):
        t = (list, t('').join)[bool(keepstr)]
    seen = []
    return t(c for c in seq if not (c in seen or seen.append(c)))

Case by case optimization (untested):

def unique(seq, keepstr=True):
    t = type(seq)
    if t in (unicode, str):
        t = (list, t('').join)[bool(keepstr)]
    try:
        remaining = set(seq)
        seen = set()
        return t(c for c in seq if (c in remaining and 
                                    not remaining.remove(c)))
    except TypeError: # hashing didn't work, see if seq is sortable
        try:
            from itertools import groupby
            s = sorted(enumerate(seq),key=lambda (i,v):(v,i))
            return t(g.next() for k,g in groupby(s, lambda (i,v): v))
        except:  # not sortable, use brute force
            seen = []
            return t(c for c in seq if not (c in seen or seen.append(c)))

I don't have Python 2.4 available right now to try either of the above.

Note that all the schemes fail if seq is some arbitrary iterable,
rather one of the built-in sequence types.

I think these iterator approaches get more readable as one becomes
used to them.



More information about the Python-list mailing list