R: unique items in lists

Remco Gerlich scarblac at pino.selwerd.nl
Tue Mar 27 08:09:12 EST 2001


comp.lang.python <micampe at f2s.com> wrote in comp.lang.python:
> Here is what I'm doing now:
> 
>         for w in words:
>             while words.count(w) > 1:
>                 words.remove(w)
> 
> comments?

First, you modify a list that you are looping over. That can give
unpredictable results, but in this case it seems to work.

It will take a long time. 'words.remove(w)' takes time proportional to the
length of the list; words.count(w) takes time proportional to the length of
the list itself, and you repeat that for every duplicate of a word, and the
total is repeated for every unique word. In total it takes time proportional
to N^3 (for a list of length N), I think.

The dictionary method does one dictionary action for each word, and then it
does a dict.keys() which takes time proportional to each unique word. In total
it takes time proportional to N.

For a longer list, N^3 will become much much bigger than N, so your solution
is slower, unfortunately.

My function returns a new list and doesn't change the old list, you can do
that with

def make_unique_list(l):
   dict = {}
   for word in l:
      dict[word] = 1
   l[:] = dict.keys() # Slice assignment changes the old list

Try them both out, I'd say :). For smaller lists they're both quick, but
when you get to lengths over a few thousand, the difference is obvious.

-- 
Remco Gerlich



More information about the Python-list mailing list