CRC-module

Michael Hudson mwh21 at cam.ac.uk
Wed Nov 24 10:40:31 EST 1999


Thomas Weholt <thomas at bibsyst.no> writes:

> Hi,
> 
> Ok, so I`ve looked into zlib.crc32 and zlib.adler32. They seem easy
> enough to use, but I thought crc-codes had characters and numbers in
> them, not just a plain integer like the methods above return. ( As you
> can see, I`m a complete ass on this subject, but don`t have time to do
> the proper research myself, and was hoping for a "quick fix" ... )

Well, on one level there's not much difference between a binary string
and an integer. But crc32 returns a 32-bit value, so it's most
convenient/efficient to store it in an integer.
 
> A friend of mine mentioned that I should try SHA-1 instead, for more
> accuracy. Can anybody give me an example on how to compute crc-codes,
> using zlib or preferrably some more accurate method, for single files ??

Well, comparing crc32 and SHA-1 or md5 isn't really comparing like
with like, to the (small) extent of my knowledge on the matter; crc32
(AFAIK) is designed to spot accidental transmission errors, sha-1/md5
are (certainly) designed to spot malicious modification.

Also md5 is 128 bits and sha-1 is 160, so obviously these are finer
grained than crc32.

> If this is all it takes :
> 
> crc = module_name.crc_method(file)
> 
> and comparison is done like :
> 
> if (crc1 == crc2): print "Equal."
> else : print "Different."
> 
> then all I need is the name of the most effective/accurate module to
> use.
> 
> If, for some strange reason, I should use one module instead of another,
> that info would be interesting too.

For that application, it'd probably be best to use sha, eg:

import sha
sha.sha(open(filename).read()).digest()

Efficiency-wise that'll be IO bound so the fact that SHA-1 is more
CPU-intensive (I think) than crc32 shouldn't be relavent.

HTH,
Michael




More information about the Python-list mailing list