Efficient checksum calculating on lagre files

Christos TZOTZIOY Georgiou tzot at sil-tec.gr
Tue Feb 8 11:34:45 EST 2005


On Tue, 08 Feb 2005 16:13:43 +0000, rumours say that Robin Becker
<robin at reportlab.com> might have written:

>Ola Natvig wrote:
>> Hi all
>> 
>> Does anyone know of a fast way to calculate checksums for a large file. 
>> I need a way to generate ETag keys for a webserver, the ETag of large 
>> files are not realy nececary, but it would be nice if I could do it. I'm 
>> using the python hash function on the dynamic generated strings (like in 
>> page content) but on things like images I use the shutil's 
>> copyfileobject function and the hash of a fileobject's hash are it's 
>> handlers memmory address.
>> 
>> Does anyone know a python utility which is possible to use, perhaps 
>> something like the md5sum utility on *nix systems.
>> 
>> 
>well md5sum is usable on many systems. I run it on win32 and darwin.

[snip use of some md5sum.exe]

Why not use the md5 module?

The following md5sum.py is in use and tested, but not "failproof".

|import sys, os, md5
|from glob import glob
|
|for arg in sys.argv[1:]:
|    for filename in glob(arg):
|        fp= file(filename, "rb")
|        md5sum= md5.new()
|        while True:
|            data= fp.read(65536)
|            if not data: break
|            md5sum.update(data)
|        fp.close()
|        print md5sum.hexdigest(), filename

It's fast enough, especially if you cache results.
-- 
TZOTZIOY, I speak England very best.
"Be strict when sending and tolerant when receiving." (from RFC1958)
I really should keep that in mind when talking with people, actually...



More information about the Python-list mailing list