Merging ordered lists

Peter Otten __peter__ at web.de
Tue Jun 3 04:22:55 EDT 2008


etal wrote:

> > def unique(items):
> >     u = set(items)
> >     if len(u) == len(items):
> >         return items
> >     result = []
> >     for item in items:
> >         if item in u:
> >             result.append(item)
> >             u.remove(item)
> >     return result
> 
> You did right by preserving the original (non-alphabetical) ordering,
> but I'm less enthusiastic about the shape of this function. My
> original function used 7 lines of code, and only 1 for the unique()
> step. This uses up to three container objects. Is it really an
> improvement?

Yes :)

Seriously, you are using O(n) containers and O(n) lookup where mine uses
O(1). For short lists it doesn't matter, but as the list length grows the
difference gets huge:

$ cat unique.py
def unique(items):
    u = set(items)
    if len(u) == len(items):
        return items
    result = []
    for item in items:
        if item in u:
            result.append(item)
            u.remove(item)
    return result


def unique_lc(ref):
    return [r for i, r in enumerate(ref) if r and r not in ref[i+1:]]


sample_a = range(1000)
sample_b = range(1000)
import random
for i in random.sample(sample_b, 10):
    sample_b[i] = 0

$ python -m timeit -s"from unique import unique as u, sample_a as s" "u(s)"
10000 loops, best of 3: 52.8 usec per loop
$ python -m timeit -s"from unique import unique_lc as u, sample_a as
s" "u(s)"
10 loops, best of 3: 44.2 msec per loop

3 orders of magnitude for the shortcut.

$ python -m timeit -s"from unique import unique as u, sample_b as s" "u(s)"
1000 loops, best of 3: 646 usec per loop
$ python -m timeit -s"from unique import unique_lc as u, sample_b as
s" "u(s)"
10 loops, best of 3: 39 msec per loop

Still 2 orders of magnitude if my unique() does some actual work.

Peter



More information about the Python-list mailing list