Snippet: bitcount()
Christian Tismer
tismer at appliedbiometrics.com
Sat Jul 17 07:24:55 EDT 1999
Klaus Alexander Seistrup wrote:
>
> Tim Peters <tim_one at email.msn.com> wrote:
>
> > > -- unless I decide to count several bits at a time, but
> > > I wouldn't know how to construct such a thing.
> >
> > Oh, sure you do <wink>:
> >
> > count[0] = 0
> > count[1] = 1
> > count[2] = 1
> > count[3] = 2
> > ...
> > count[255] = 8
> >
> > Now you can get the popcount of any one-byte int with one table-lookup.
> > And the popcount of any N-byte int or long with N table lookups and N-1
> > adds. Put 2**16 entries in the table to cut the work in half.
>
> I get your point, thanks, but I will spare this group for a posting of such
> a 2**16 table, leave alone the full 32-bit version. ;-)
FYI, here is where Tim and I stopped a "race".
We both claimed to have the fastest version of bit counting.
Apparently, results were dependant from your CPU's cache size.
Mine had 512KB, Tims only 256KB, so on his machine he was faster,
and on mine, I was. We decided to stop there.
Here is Tim's most elegant last version which I like the best:
______________________________________________________________
_run2mask = {1: 0x5555555555555555L,
2: 0x3333333333333333L,
4: 0x0F0F0F0F0F0F0F0FL,
8: 0x00FF00FF00FF00FFL}
def buildmask2(run, n):
run2 = run + run
k = (n + run2 - 1) / run2
n = k * run2
try:
answer = _run2mask[run]
k2 = 64 / run2
except KeyError:
answer = (1L << run) - 1
k2 = 1
while k > k2:
k2 = k2 + k2
if k >= k2:
answer = answer | (answer << (run * k2))
else:
answer = answer | (answer << (run2 * (k - k2/2)))
k2 = k
if k2 > k:
answer = answer >> (run2 * (k2 - k))
return answer, n
def popcount4(x):
lomask, n = buildmask2(1, nbits(x))
x = x - ((x >> 1) & lomask)
target = 2
while n > target:
lomask, n = buildmask2(target, n)
x = (x & lomask) + ((x >> target) & lomask)
target = target + target
for i in range(1, target/2):
if n <= target:
break
n = n >> 1
n = (n + target - 1) / target * target
x = (x & ((1L << n) - 1)) + (x >> n)
return int(x)
______________________________________________________________
And here is my dirty, brutal hack which is at the "same" speed
for sufficiently inexact values of "same":
______________________________________________________________
import marshal, operator, array, string
def makeweighttrans() :
weights = map(popcount, range(256))
return string.join(map(chr, weights), '')
def popcount6(x, dumps = marshal.dumps,
add = operator.add,
trans = makeweighttrans(),
reduce = reduce,
translate = string.translate,
mkarray = array.array):
digitcounts = mkarray("B", translate(dumps(x), trans))
# overwrite type code & digit count
digitcounts[0] = digitcounts[1] = digitcounts[2] = \
digitcounts[3] = digitcounts[4] = 0
return reduce(add, digitcounts)
______________________________________________________________
So that' where you can get when trying to count bits.
I can't encourage you to try to outperform one of the
above and don't want to start another race (at least
I will not try). But if somebody comes up with a solution
which is, say, 30 percent faster for any distribution of
bits, and it uses standard Python only, then I will
give you a free website and pay your fee for the next
Python conference.
and-maybe-employ-you-in-my-company-ly y'rs - chris
p.s.: You can shorten the above by Mike Hudson's bytecodehacks,
but it will not get faster. Actually a tick slower under Win9X,
due to the way how VC++6.0 optimizes LOAD_FAST and LOAD_CONST
slightly different (they *are* technically equivalent).
--
Christian Tismer :^) <mailto:tismer at appliedbiometrics.com>
Applied Biometrics GmbH : Have a break! Take a ride on Python's
Kaiserin-Augusta-Allee 101 : *Starship* http://starship.python.net
10553 Berlin : PGP key -> http://wwwkeys.pgp.net
PGP Fingerprint E182 71C7 1A9D 66E9 9D15 D3CC D4D7 93E2 1FAE F6DF
we're tired of banana software - shipped green, ripens at home
More information about the Python-list
mailing list