Efficient checksum calculating on lagre files

Michael Hoffman cam.ac.uk at mh391.invalid
Tue Feb 8 11:27:20 EST 2005


Ola Natvig wrote:

> Does anyone know of a fast way to calculate checksums for a large file. 
> I need a way to generate ETag keys for a webserver, the ETag of large 
> files are not realy nececary, but it would be nice if I could do it. I'm 
> using the python hash function on the dynamic generated strings (like in 
> page content) but on things like images I use the shutil's 
> copyfileobject function and the hash of a fileobject's hash are it's 
> handlers memmory address.
> 
> Does anyone know a python utility which is possible to use, perhaps 
> something like the md5sum utility on *nix systems.

Is there a reason you can't use the sha module? Using a random large file I had
lying around:

sha.new(file("jdk-1_5_0-linux-i586.rpm").read()).hexdigest() # loads all into memory first

If you don't want to load the whole object into memory at once you can always call out to the sha1sum utility yourself as well.

 >>> subprocess.Popen(["sha1sum", ".bashrc"], stdout=subprocess.PIPE).communicate()[0].split()[0]
'5c59906733bf780c446ea290646709a14750eaad'
-- 
Michael Hoffman



More information about the Python-list mailing list