testing for uniquness in a large list

Josiah Carlson jcarlson at uci.edu
Wed Oct 20 13:26:14 EDT 2004


bearophileHUGS at lycos.com (bearophile) wrote:

> Alex Martelli's solution is *very* good, but there is a sampling
> problem: putting a simple printing inside his program:
> 
> if not (len(results) % 5000):
>     print len(results)
> 
> You can see that it slows down a lot when the dictionary contains
> about 100000-120000 different sequences, because there are many
> collisions, and it keeps slowing down. Probably a little speed up of
> this code cannot help. A different algoritm can be useful.
> I don't know... direct sequences generation doesn't seem possibile.
> Maybe a partially direct generation can be okay.

There are only 125,970 unique sequences of 12 items from 20 where order
does not matter (20!)/(12!8!), which is what Alex was generating, and
what "Lol McBride" asked for.

Remove the "ordering does not matter" constraint (by removing the
list.sort() call) will complete much faster as the space to select from
becomes 60,339,831,552,000 in size.

Math saves the day!

 - Josiah




More information about the Python-list mailing list