Comparing sequences with range objects

Antoon Pardon antoon.pardon at vub.be
Fri Apr 8 17:08:10 EDT 2022


Op 8/04/2022 om 16:28 schreef duncan smith:
> On 08/04/2022 08:21, Antoon Pardon wrote:
>>
>> Yes I know all that. That is why I keep a bucket of possible duplicates
>> per "identifying" field that is examined and use some heuristics at the
>> end of all the comparing instead of starting to weed out the duplicates
>> at the moment something differs.
>>
>> The problem is, that when an identifying field is judged to be unusable,
>> the bucket to be associated with it should conceptually contain all 
>> other
>> records (which in this case are the indexes into the population list).
>> But that will eat a lot of memory. So I want some object that behaves as
>> if it is a (immutable) list of all these indexes without actually 
>> containing
>> them. A range object almost works, with the only problem it is not
>> comparable with a list.
>>
>
> Is there any reason why you can't use ints? Just set the relevant bits.

Well my first thought is that a bitset makes it less obvious to calulate
the size of the set or to iterate over its elements. But it is an idea
worth exploring.

-- 
Antoon.


More information about the Python-list mailing list