md5 and large files
Andrew Dalke
adalke at mindspring.com
Sun Oct 17 21:32:48 EDT 2004
Nelson Minar wrote:
> If all you want to do is verify that a file is not corrupt, MD5 is the
> wrong algorithm to use. Use something fast like crc32.
How much faster is that in Python? It looks about the
same to me.
>>> def crc32file(infile):
... crc = 0
... while 1:
... s = infile.read(16384)
... if not s:
... return crc
... crc = binascii.crc32(s, crc)
...
>>> def md5file(infile):
... md5obj = md5.new()
... while 1:
... s = infile.read(16384)
... if not s:
... return md5obj.hexdigest()
... md5obj.update(s)
...
>>> os.path.getsize("/Users/dalke/databases/sprot/sprot40.dat")
320673785L
>>> if 1:
... t1 = time.time()
... print md5file(open("/Users/dalke/databases/sprot/sprot40.dat"))
... t2 = time.time()
... print t2-t1
...
a2f54de61e4db857aadce04298ab177e
10.9378840923
>>> if 1:
... t1 = time.time()
... print crc32file(open("/Users/dalke/databases/sprot/sprot40.dat"))
... t2 = time.time()
... print t2-t1
...
-1921799528
10.7424199581
>>>
I think most of the time is spent doing I/O, not computing
the checksum. That's probably even true if written in C.
Andrew
dalke at dalkescientific.com
More information about the Python-list
mailing list