Hamming Distance

Matimus mccredie at gmail.com
Thu Jun 19 20:37:23 EDT 2008


On Jun 19, 4:27 pm, godavemon <davefow... at gmail.com> wrote:
> I need to calculate the Hamming Distance of two integers.  The hamming
> distance is the number of bits in two integers that don't match.  I
> thought there'd be a function in math or scipy but i haven't been able
> to find one.  This is my function but it seems like there should be a
> faster way.  I do this computation many times and speed up is
> important.
>
> def hamdist( a, b , bits = 32):
>     def _hamdist( x, bits):
>         if bits:
>             return (x & 1) + _hamdist(x >> 1, bits-1)
>         return x & 1
>     return _hamdist( a ^ b, bits)
>
> Another alternative would be to convert the XOR to a binary string and
> count the # of 1's.
>
> Which would be fastest?  Are there better alternatives?
>
> Thanks!

I see no good reason to use recursion for this type of thing. Here are
some of my attempts:

[code]
from math import log

def yours(a, b , bits = 32):
     def _hamdist( x, bits):
         if bits:
             return (x & 1) + _hamdist(x >> 1, bits-1)
         return x & 1
     return _hamdist(a ^ b, bits)


def simple(a, b, bits=32):
    x = a ^ b
    return sum((x >> i & 1) for i in xrange(bits))

def lazy(a, b, bits=32):
    x = (a ^ b) & ((1 << bits) - 1)
    tot = 0
    while x:
        tot += x & 1
        x >>= 1
    return tot

def fancy(a, b, bits=32):
    x = (a ^ b) & ((1 << bits) - 1)
    tot = 0
    while x:
        tot += 1
        x ^= 1 << int(log(x, 2))
    return tot

test_vals = (
        ((0xffffffff, 0), 32),
        ((0,0), 0),
        ((1,0), 1),
        ((0x80000000, 0), 1),
        ((0x55555555, 0), 16)
        )

def test(f):
    test_vals = (
            ((0xffffffff, 0), 32), # ALL
            ((0,0), 0), # None
            ((1,0), 1), # First
            ((0x80000000, 0), 1), # Last
            ((0x55555555, 0), 16), # Every Other
            ((0xffff, 0), 16), # First Half
            ((0xffff0000, 0), 16), # Last Half
            )
    for i, (args, exp) in enumerate(test_vals):
        if f(*args) != exp:
            return 0
    return 1

if __name__ == "__main__":
    for f in (yours, simple, lazy, fancy):
        if not test(f):
            print "%s failed"%f.__name__
[/code]

The python module `timeit` is handy for testing speed:

python -mtimeit -s"from hamdist import *" "test(yours)"
10000 loops, best of 3: 95.1 usec per loop

python -mtimeit -s"from hamdist import *" "test(simple)"
10000 loops, best of 3: 65.3 usec per loop

python -mtimeit -s"from hamdist import *" "test(lazy)"
10000 loops, best of 3: 59.8 usec per loop

python -mtimeit -s"from hamdist import *" "test(fancy)"
10000 loops, best of 3: 77.2 usec per loop

Even the ridiculous `fancy` version beat the recursive version.

Matt



More information about the Python-list mailing list