Bit substring search
Scott David Daniels
Scott.Daniels at Acm.Org
Wed Jun 25 09:11:05 EDT 2008
Kris Kennaway wrote:
> Thanks for the pointers, I think a C extension will end up being the way
> to go, unless someone has beaten me to it and I just haven't found it yet.
Depending on the pattern length you are targeting, it may be fastest to
increase the out-of-loop work. For a 40-bit string, build an 8-target
Aho-Corasick machine, and at each match check the endpoints. This will
only work well if 40 bits is at the low end of what you are hunting for.
Roughly:
targets[0] = 5-byte string as byte-aligned
targets[N in 1..7] = 4-byte strings representing the lead
four bytes after discarding the high-order N bits
m = AhoCorasick(targets)
def hunt(m, source):
m.state = 0
old_block = None
for n, block in enumerate(source):
for recognized, where in m.search(block):
if recognized:
if <check lead and trail bits>:
yield block, where, recognized
else:
yield block, where, 0
--Scott David Daniels
Scott.Daniels at Acm.Org
More information about the Python-list
mailing list