Hamming Distance

godavemon davefowler at gmail.com
Thu Jun 19 23:08:51 EDT 2008


Great thanks!

On Jun 19, 5:37 pm, Matimus <mccre... at gmail.com> wrote:
> On Jun 19, 4:27 pm, godavemon <davefow... at gmail.com> wrote:
>
>
>
> > I need to calculate the Hamming Distance of two integers.  The hamming
> > distance is the number of bits in two integers that don't match.  I
> > thought there'd be a function in math or scipy but i haven't been able
> > to find one.  This is my function but it seems like there should be a
> > faster way.  I do this computation many times and speed up is
> > important.
>
> > def hamdist( a, b , bits = 32):
> >     def _hamdist( x, bits):
> >         if bits:
> >             return (x & 1) + _hamdist(x >> 1, bits-1)
> >         return x & 1
> >     return _hamdist( a ^ b, bits)
>
> > Another alternative would be to convert the XOR to a binary string and
> > count the # of 1's.
>
> > Which would be fastest?  Are there better alternatives?
>
> > Thanks!
>
> I see no good reason to use recursion for this type of thing. Here are
> some of my attempts:
>
> [code]
> from math import log
>
> def yours(a, b , bits = 32):
>      def _hamdist( x, bits):
>          if bits:
>              return (x & 1) + _hamdist(x >> 1, bits-1)
>          return x & 1
>      return _hamdist(a ^ b, bits)
>
> def simple(a, b, bits=32):
>     x = a ^ b
>     return sum((x >> i & 1) for i in xrange(bits))
>
> def lazy(a, b, bits=32):
>     x = (a ^ b) & ((1 << bits) - 1)
>     tot = 0
>     while x:
>         tot += x & 1
>         x >>= 1
>     return tot
>
> def fancy(a, b, bits=32):
>     x = (a ^ b) & ((1 << bits) - 1)
>     tot = 0
>     while x:
>         tot += 1
>         x ^= 1 << int(log(x, 2))
>     return tot
>
> test_vals = (
>         ((0xffffffff, 0), 32),
>         ((0,0), 0),
>         ((1,0), 1),
>         ((0x80000000, 0), 1),
>         ((0x55555555, 0), 16)
>         )
>
> def test(f):
>     test_vals = (
>             ((0xffffffff, 0), 32), # ALL
>             ((0,0), 0), # None
>             ((1,0), 1), # First
>             ((0x80000000, 0), 1), # Last
>             ((0x55555555, 0), 16), # Every Other
>             ((0xffff, 0), 16), # First Half
>             ((0xffff0000, 0), 16), # Last Half
>             )
>     for i, (args, exp) in enumerate(test_vals):
>         if f(*args) != exp:
>             return 0
>     return 1
>
> if __name__ == "__main__":
>     for f in (yours, simple, lazy, fancy):
>         if not test(f):
>             print "%s failed"%f.__name__
> [/code]
>
> The python module `timeit` is handy for testing speed:
>
> python -mtimeit -s"from hamdist import *" "test(yours)"
> 10000 loops, best of 3: 95.1 usec per loop
>
> python -mtimeit -s"from hamdist import *" "test(simple)"
> 10000 loops, best of 3: 65.3 usec per loop
>
> python -mtimeit -s"from hamdist import *" "test(lazy)"
> 10000 loops, best of 3: 59.8 usec per loop
>
> python -mtimeit -s"from hamdist import *" "test(fancy)"
> 10000 loops, best of 3: 77.2 usec per loop
>
> Even the ridiculous `fancy` version beat the recursive version.
>
> Matt






More information about the Python-list mailing list