Binary search tree

Tue Nov 13 03:45:51 EST 2007

En Mon, 12 Nov 2007 16:21:36 -0300, Scott SA <pydev at rscorp.ab.ca> escribió:

> I decided to test the speeds of the four methods:

(but one should always check for correctness before checking speed)

> def dict_example(urls):
>     d = {}
>     for url in urls:
>         if url in d:
>             d[url] = 1

The above does nothing more than iterate over all urls.

> For the simple quest of "Find the unique URLs in a list", the 'dict'  
> example is by far the quickest.

After correction, there is no significative difference between  
dict_example and set_example (sometimes one is slightly slower than the  
other, and sometimes it's the other way). But the simpler s = set(urls)  
suggested by Terry Reedy clearly wins (at least on my tests).

> Buuut, if you need the indexes and possibly a 'hit count', then the  
> enumerated example seems about the best (of the possibilities tested)

In this case there is another alternative using defaultict:

def defaultdict_example(urls):
     d = defaultdict(list)
     for idx,url in enumerate(urls):
         d[url].append(idx)

which is faster than setdefault_enum (again, on my tests).

Another point worth menction, I did shuffle(dummy_urls) before testing.  
Having all of them ordered may introduce some bias.

-- 
Gabriel Genellina