Efficient checksum calculating on lagre files
Christos TZOTZIOY Georgiou
tzot at sil-tec.gr
Tue Feb 8 11:34:45 EST 2005
On Tue, 08 Feb 2005 16:13:43 +0000, rumours say that Robin Becker
<robin at reportlab.com> might have written:
>Ola Natvig wrote:
>> Hi all
>>
>> Does anyone know of a fast way to calculate checksums for a large file.
>> I need a way to generate ETag keys for a webserver, the ETag of large
>> files are not realy nececary, but it would be nice if I could do it. I'm
>> using the python hash function on the dynamic generated strings (like in
>> page content) but on things like images I use the shutil's
>> copyfileobject function and the hash of a fileobject's hash are it's
>> handlers memmory address.
>>
>> Does anyone know a python utility which is possible to use, perhaps
>> something like the md5sum utility on *nix systems.
>>
>>
>well md5sum is usable on many systems. I run it on win32 and darwin.
[snip use of some md5sum.exe]
Why not use the md5 module?
The following md5sum.py is in use and tested, but not "failproof".
|import sys, os, md5
|from glob import glob
|
|for arg in sys.argv[1:]:
| for filename in glob(arg):
| fp= file(filename, "rb")
| md5sum= md5.new()
| while True:
| data= fp.read(65536)
| if not data: break
| md5sum.update(data)
| fp.close()
| print md5sum.hexdigest(), filename
It's fast enough, especially if you cache results.
--
TZOTZIOY, I speak England very best.
"Be strict when sending and tolerant when receiving." (from RFC1958)
I really should keep that in mind when talking with people, actually...
More information about the Python-list
mailing list