Comparing sequences with range objects

duncan smith duncan at invalid.invalid
Sun Apr 10 20:01:58 EDT 2022


On 10/04/2022 21:20, Antoon Pardon wrote:
> 
> 
> Op 9/04/2022 om 02:01 schreef duncan smith:
>> On 08/04/2022 22:08, Antoon Pardon wrote:
>>>
>>> Well my first thought is that a bitset makes it less obvious to calulate
>>> the size of the set or to iterate over its elements. But it is an idea
>>> worth exploring.
>>>
>>
>>
>>
>> def popcount(n):
>>     """
>>     Returns the number of set bits in n
>>     """
>>     cnt = 0
>>     while n:
>>         n &= n - 1
>>         cnt += 1
>>     return cnt
>>
>> and not tested,
>>
>> def iterinds(n):
>>     """
>>     Returns a generator of the indices of the set bits of n
>>     """
>>     i = 0
>>     while n:
>>         if n & 1:
>>             yield i
>>         n = n >> 1
>>         i += 1
>>
> Sure but these seem rather naive implementation with a time complexity of
> O(n) where n is the maximum number of possible elements. Using these would
> turn my O(n) algorithm in a O(n^2) algorithm.
> 

I thought your main concern was memory. Of course, dependent on various 
factors, you might be able to do much better than the above. But I don't 
know what your O(n) algorithm is, how using a bitset would make it 
O(n^2), or if the O(n^2) algorithm would actually be slower for typical 
n. The overall thing sounds broadly like some of the blocking and 
clustering methods I've come across in record linkage.

Duncan


More information about the Python-list mailing list