Comparing sequences with range objects

duncan smith duncan at invalid.invalid
Fri Apr 8 20:01:50 EDT 2022


On 08/04/2022 22:08, Antoon Pardon wrote:
> 
> Op 8/04/2022 om 16:28 schreef duncan smith:
>> On 08/04/2022 08:21, Antoon Pardon wrote:
>>>
>>> Yes I know all that. That is why I keep a bucket of possible duplicates
>>> per "identifying" field that is examined and use some heuristics at the
>>> end of all the comparing instead of starting to weed out the duplicates
>>> at the moment something differs.
>>>
>>> The problem is, that when an identifying field is judged to be unusable,
>>> the bucket to be associated with it should conceptually contain all 
>>> other
>>> records (which in this case are the indexes into the population list).
>>> But that will eat a lot of memory. So I want some object that behaves as
>>> if it is a (immutable) list of all these indexes without actually 
>>> containing
>>> them. A range object almost works, with the only problem it is not
>>> comparable with a list.
>>>
>>
>> Is there any reason why you can't use ints? Just set the relevant bits.
> 
> Well my first thought is that a bitset makes it less obvious to calulate
> the size of the set or to iterate over its elements. But it is an idea
> worth exploring.
> 



def popcount(n):
     """
     Returns the number of set bits in n
     """
     cnt = 0
     while n:
         n &= n - 1
         cnt += 1
     return cnt

and not tested,

def iterinds(n):
     """
     Returns a generator of the indices of the set bits of n
     """
     i = 0
     while n:
         if n & 1:
             yield i
         n = n >> 1
         i += 1


Duncan




More information about the Python-list mailing list