Efficient checksum calculating on lagre files

Fredrik Lundh fredrik at pythonware.com
Tue Feb 8 11:26:07 EST 2005


Robin Becker wrote:

>> Does anyone know of a fast way to calculate checksums for a large file. I need a way to generate 
>> ETag keys for a webserver, the ETag of large files are not realy nececary, but it would be nice 
>> if I could do it. I'm using the python hash function on the dynamic generated strings (like in 
>> page content) but on things like images I use the shutil's copyfileobject function and the hash 
>> of a fileobject's hash are it's handlers memmory address.
>>
>> Does anyone know a python utility which is possible to use, perhaps something like the md5sum 
>> utility on *nix systems.
>>
> well md5sum is usable on many systems. I run it on win32 and darwin.
>
> I tried this in 2.4 with the new subprocess module

on my machine, Python's md5+mmap is a little bit faster than
subprocess+md5sum:

    import os, md5, mmap

    file = open(fn, "r+")
    size = os.path.getsize(fn)
    hash = md5.md5(mmap.mmap(file.fileno(), size)).hexdigest()

(I suspect that md5sum also uses mmap, so the difference is
probably just the subprocess overhead)

</F> 






More information about the Python-list mailing list