Best way to handle large lists?

Tue Oct 3 11:02:52 EDT 2006

Larry Bates wrote:
> Chaz Ginger wrote:
>> I have a system that has a few lists that are very large (thousands or
>> tens of thousands of entries) and some that are rather small. Many times
>> I have to produce the difference between a large list and a small one,
>> without destroying the integrity of either list. I was wondering if
>> anyone has any recommendations on how to do this and keep performance
>> high? Is there a better way than
>>
>> [ i for i in bigList if i not in smallList ]
>>
>> Thanks.
>> Chaz
> 
> 
> IMHO the only way to speed things up is to know more about the
> actual data in the lists (e.g are the elements unique, can they
> be sorted, etc) and take advantage of all that information to
> come up with a "faster" algorithm.  If they are unique, sets
> might be a good choice.  If they are sorted, bisect module
> might help.  The specifics about the list(s) may yield a faster
> method.
> 
> -Larry
Each item in the list is a fully qualified domain name, e.g.
foo.bar.com. The order in the list has no importance. That is about all
there is to the list other than to say the number of items in a list can
top out about 10,000.

Chaz