Fastest way to convert a byte of integer into a list

John Machin sjmachin at lexicon.net
Sat Jul 14 22:14:40 EDT 2007


On Jul 15, 11:12 am, "mensana... at aol.com" <mensana... at aol.com> wrote:
> On Jul 14, 5:49?pm, Paul McGuire <pt... at austin.rr.com> wrote:
>
>
>
> > On Jul 13, 3:46 pm, "mensana... at aol.com" <mensana... at aol.com> wrote:
>
> > > On Jul 13, 5:17 am, Paul McGuire <pt... at austin.rr.com> wrote:
>
> > > > On Jul 12, 5:34 pm, Godzilla <godzillais... at gmail.com> wrote:
>
> > > > > Hello,
>
> > > > > I'm trying to find a way to convert an integer (8-bits long for
> > > > > starters) and converting them to a list, e.g.:
>
> > > > > num = 255
> > > > > numList = [1,1,1,1,1,1,1,1]
>
> > > > > with the first element of the list being the least significant, so
> > > > > that i can keep appending to that list without having to worry about
> > > > > the size of the integer. I need to do this because some of the
> > > > > function call can return a 2 lots of 32-bit numbers. I have to find a
> > > > > way to transport this in a list... or is there a better way?
>
> > > > Standing on the shoulders of previous posters, I put this together.
>
> > > > -- Paul
>
> > > But aren't we moving backwards? The OP did ask for the fastest way.
>
> > > I put this together (from other posters and my own):
>
> > > import gmpy
> > > import time
>
> > > y = 2**177149 - 1
>
> > > # init list of tuples by byte
> > > bytebits = lambda num : [num >> i & 1 for i in range(8)]
> > > bytes = [ tuple(bytebits(i)) for i in range(256) ]
> > > # use bytes lookup to get bits in a 32-bit integer
> > > bits = lambda num : sum((bytes[num >> i & 255] for i in range(0,32,8)),
> > > ())
> > > # use base-2 log to find how many bits in an integer of arbitrary
> > > length
> > > from math import log,ceil
> > > log_of_2 = log(2)
> > > numBits = lambda num : int(ceil(log(num)/log_of_2))
> > > # expand bits to integers of arbitrary length
> > > arbBits = lambda num : sum((bytes[num >> i & 255] for i in
> > > range(0,numBits(num),8)),())
> > > t0 = time.time()
> > > L = arbBits(y)
> > > t1 = time.time()
> > > print 'Paul McGuire algorithm:',t1-t0
>
> > > t0 = time.time()
> > > L = [y >> i & 1 for i in range(177149)]
> > > t1 = time.time()
> > > print '     Matimus algorithm:',t1-t0
>
> > > x = gmpy.mpz(2**177149 - 1)
> > > t0 = time.time()
> > > L = [gmpy.getbit(x,i) for i in range(177149)]
> > > t1 = time.time()
> > > print '  Mensanator algorithm:',t1-t0
>
> > > ##    Paul McGuire algorithm: 17.4839999676
> > > ##         Matimus algorithm: 3.28100013733
> > > ##      Mensanator algorithm: 0.125
>
> > Oof!  Pre-calculating those byte bitmasks doesn't help at all!  It
> > would seem it is faster to use a single list comp than to try to sum
> > together the precalcuated sublists.
>
> > I *would* say though that it is somewhat cheating to call the other
> > two algorithms with the hardcoded range length of 177149, when you
> > know this is the right range because this is tailored to fit the input
> > value 2**177149-1.  This would be a horrible value to use if the input
> > number were something small, like 5.  I think numBits still helps here
> > to handle integers of arbitrary length (and only adds a slight
> > performance penalty since it is called only once).
>
> I agree. But I didn't want to compare your numBits
> against gmpy's numdigits() which I would normally use.
> But since the one algorithm required a hardcoded number,
> I thought it best to hardcode all three.
>
> I originally coded this stupidly by converting
> the number to a string and then converting to
> a list of integers. But Matimus had already posted
> his which looked a lot better than mine, so I didn't.
>
> But then I got to wondering if Matimus' solution
> requires (B**2+B)/2 shift operations for B bits.
>
> My attempt to re-code his solution to only use B
> shifts didn't work out and by then you had posted
> yours. So I did the 3-way comparison using gmpy's
> direct bit comparison. I was actually surprised at
> the difference.
>
> I find gmpy's suite of bit manipulation routines
> extremely valuable and use them all the time.
> That's another reason for my post, to promote gmpy.
>
> It is also extremely important to keep coercion
> out of loops. In other words, never use literals,
> only pre-coerced constants. For example:
>
> import gmpy
> def collatz(n):
>   ONE = gmpy.mpz(1)
>   TWO = gmpy.mpz(2)
>   TWE = gmpy.mpz(3)
>   while n != ONE:
>     if n % TWO == ONE:
>       n = TWE*n + ONE
>     else:
>       n = n/TWO
> collatz(gmpy.mpz(2**177149-1))
>
>
>
> > -- Paul

Try the following function; it works for arbitrary positive integers,
doesn't need a 3rd party module, doesn't need to worry whether all
that ceil/log/log floating-point stuffing about could result in an off-
by-one error in calculating the number of bits, works with Python
1.5.2, and is competitive with Matimus's method.

def listbits(n):
    result = []
    app = result.append
    while n:
        app(n & 1)
        n = n >> 1
    return result

Cheers,
John




More information about the Python-list mailing list