syntax difference

Bart bc at freeuk.com
Wed Jun 27 08:42:13 EDT 2018


On 27/06/2018 12:42, Peter J. Holzer wrote:
> On 2018-06-27 11:11:37 +1200, Gregory Ewing wrote:
>> Bart wrote:
>>>     x = set(range(10_000_000))
>>>
>>> This used up 460MB of RAM (the original 100M I tried exhausted the memory).
>>>
>>> The advantage of Pascal-style sets is that that same set will occupy
>>> only 1.25MB, as it is a bit-map.
>>
>> That's true, but they're also extremely limited compared to
>> the things you can do with Python sets. They're really two
>> quite different things, designed for different use cases.
>>
>> Compact sets of integers are possible in Python, but not
>> as a built-in type. This suggests that they're not needed
>> much
> 
> Also, when such sets are needed, a simple array of bits may not be
> sufficient. For example, sets may be sparse, or they may have almost all
> except a few bits set. In these cases a tree or RLE representation is
> much smaller and faster. There are a number of such implementations
> (Judy Arrays and Roaring Bitmaps come to mind[1]). Each has its
> advantages and limitations, so its probably better to leave the choice
> to the author.

 From roaringbitmap.org:

"Bitsets, also called bitmaps, are commonly used as fast data 
structures. Unfortunately, they can use too much memory. To compensate, 
we often use compressed bitmaps."

So from one extreme to another: from the 300-400 bits per element used 
by Python's set type, to some presumably tiny fraction of a bit [on 
average] used in this scheme, as 1 bit per element is too much.

Is there no place at all for an ordinary, straightforward, uncompressed 
bit-set where each element takes exactly one bit? It does seem as though 
Python users have an aversion to simplicity and efficiency.

FWIW when I was working with actual image bitmaps, they were nearly 
always uncompressed when in memory. 1-bit-per-pixel images, for mono 
images or masks, would pack 8 pixels per byte. Compression would only be 
used for storage or transmission.

And the same with all the set types I've used.

However there is a slight difference with the concept of a 'set' as I 
used it, and as it was used in Pascal, compared with Python's set: there 
was the notion of the overall size of the set: the total number of 
elements, whether each was '1' or '0'.

So a set type that represented all ASCII codes would have a size of 128 
elements, indexed 0 to 127. So such a set initialised to ['A'..'Z'] and 
then inverted would give the set [0..64,91..127].

-- 
bart



More information about the Python-list mailing list