Middle matching - any Python library functions (besides re)?

bearophileHUGS at lycos.com bearophileHUGS at lycos.com
Sun Aug 27 20:40:24 EDT 2006


If you want to avoid an O(n^2) algorithm, you may need to find a
signature for each file. Then you use such signatures to compute
hashes, and unique them with a dict (dict values may be the file
names). Later you can weed out the few wrong collisions produced by the
possibly approximated signature. If you can determine of a given a file
where its heading garbage stops, then you can compute the signature
just computing the python hash of some of the following bytes (30 or
200 byte may suffice).

Bye,
bearophile




More information about the Python-list mailing list