Fastest way to convert a byte of integer into a list

Sat Jul 14 21:12:07 EDT 2007

On Jul 14, 5:49?pm, Paul McGuire <pt... at austin.rr.com> wrote:
> On Jul 13, 3:46 pm, "mensana... at aol.com" <mensana... at aol.com> wrote:
>
>
>
>
>
> > On Jul 13, 5:17 am, Paul McGuire <pt... at austin.rr.com> wrote:
>
> > > On Jul 12, 5:34 pm, Godzilla <godzillais... at gmail.com> wrote:
>
> > > > Hello,
>
> > > > I'm trying to find a way to convert an integer (8-bits long for
> > > > starters) and converting them to a list, e.g.:
>
> > > > num = 255
> > > > numList = [1,1,1,1,1,1,1,1]
>
> > > > with the first element of the list being the least significant, so
> > > > that i can keep appending to that list without having to worry about
> > > > the size of the integer. I need to do this because some of the
> > > > function call can return a 2 lots of 32-bit numbers. I have to find a
> > > > way to transport this in a list... or is there a better way?
>
> > > Standing on the shoulders of previous posters, I put this together.
>
> > > -- Paul
>
> > But aren't we moving backwards? The OP did ask for the fastest way.
>
> > I put this together (from other posters and my own):
>
> > import gmpy
> > import time
>
> > y = 2**177149 - 1
>
> > # init list of tuples by byte
> > bytebits = lambda num : [num >> i & 1 for i in range(8)]
> > bytes = [ tuple(bytebits(i)) for i in range(256) ]
> > # use bytes lookup to get bits in a 32-bit integer
> > bits = lambda num : sum((bytes[num >> i & 255] for i in range(0,32,8)),
> > ())
> > # use base-2 log to find how many bits in an integer of arbitrary
> > length
> > from math import log,ceil
> > log_of_2 = log(2)
> > numBits = lambda num : int(ceil(log(num)/log_of_2))
> > # expand bits to integers of arbitrary length
> > arbBits = lambda num : sum((bytes[num >> i & 255] for i in
> > range(0,numBits(num),8)),())
> > t0 = time.time()
> > L = arbBits(y)
> > t1 = time.time()
> > print 'Paul McGuire algorithm:',t1-t0
>
> > t0 = time.time()
> > L = [y >> i & 1 for i in range(177149)]
> > t1 = time.time()
> > print '     Matimus algorithm:',t1-t0
>
> > x = gmpy.mpz(2**177149 - 1)
> > t0 = time.time()
> > L = [gmpy.getbit(x,i) for i in range(177149)]
> > t1 = time.time()
> > print '  Mensanator algorithm:',t1-t0
>
> > ##    Paul McGuire algorithm: 17.4839999676
> > ##         Matimus algorithm: 3.28100013733
> > ##      Mensanator algorithm: 0.125
>
> Oof!  Pre-calculating those byte bitmasks doesn't help at all!  It
> would seem it is faster to use a single list comp than to try to sum
> together the precalcuated sublists.
>
> I *would* say though that it is somewhat cheating to call the other
> two algorithms with the hardcoded range length of 177149, when you
> know this is the right range because this is tailored to fit the input
> value 2**177149-1.  This would be a horrible value to use if the input
> number were something small, like 5.  I think numBits still helps here
> to handle integers of arbitrary length (and only adds a slight
> performance penalty since it is called only once).

I agree. But I didn't want to compare your numBits
against gmpy's numdigits() which I would normally use.
But since the one algorithm required a hardcoded number,
I thought it best to hardcode all three.

I originally coded this stupidly by converting
the number to a string and then converting to
a list of integers. But Matimus had already posted
his which looked a lot better than mine, so I didn't.

But then I got to wondering if Matimus' solution
requires (B**2+B)/2 shift operations for B bits.

My attempt to re-code his solution to only use B
shifts didn't work out and by then you had posted
yours. So I did the 3-way comparison using gmpy's
direct bit comparison. I was actually surprised at
the difference.

I find gmpy's suite of bit manipulation routines
extremely valuable and use them all the time.
That's another reason for my post, to promote gmpy.

It is also extremely important to keep coercion
out of loops. In other words, never use literals,
only pre-coerced constants. For example:

import gmpy
def collatz(n):
  ONE = gmpy.mpz(1)
  TWO = gmpy.mpz(2)
  TWE = gmpy.mpz(3)
  while n != ONE:
    if n % TWO == ONE:
      n = TWE*n + ONE
    else:
      n = n/TWO
collatz(gmpy.mpz(2**177149-1))

>
> -- Paul