Efficiency of using long integers to hold bitmaps

Bengt Richter bokr at oz.net
Sun Jul 10 16:11:28 EDT 2005


On Mon, 11 Jul 2005 02:37:21 +1000, "Jeff Melvaine" <jeffm at rivernet.com.au> wrote:

>I note that I can write expressions like "1 << 100" and the result is stored 
>as a long integer, which means it is stored as an integer of arbitrary 
>length.  I may need to use a large number of these, and am interested to 
>know whether the storage efficiency of long integers is in danger of 
>breaking my code if I use too many.  Would I do better to write a class that 
>defines bitwise operations on arrays of integers, each integer being assumed 
>to contain at most 32 bits?  I cannot find information in the Python manuals 
>for 2.4.1 that would allow me to resolve this question; perhaps the 
>intention is that programmers should not rely on implementation details.
>
>Thanks in advance,
>
Sounds like a possible^H^H^H^H^H^H^H^Hprobable premature optimization worry ;-)

What is a "large number of these" going to amount to? How many, tops?
And how many bits in each? How many operations between them? (Since integers
are immutable, operations mean allocation of space for new ones for results
and disposing of unused garbage ones (probably back to a special fast pool for
integers and longs)). Are you interested in a speed/memory tradeoff?

If your bit vectors are extremely large and sparse (have only a few bits "on"),
you might consider sets (import sets and help(sets)) of the bit numbers as representations.

BTW, I wonder if anyone has written an ordered bit set class in C yet. I was tempted ;-)

How much memory do you have? Buying more can be a pretty cheap way of solving space worries
if you are getting paid for your time.

You should be able to subclass int or long as a way of writing your program in terms of
your own bit vector class. Then you can change your mind and change the underlying representation
without changing the code that uses the api. Compared to plain longs it will cost you some speed
to keep converting results to your own type though.

Bottom line, whether your code will "break" due to storage problems or be fast enough
will depend on numbers you didn't provide ;-)

Regards,
Bengt Richter



More information about the Python-list mailing list