Efficient checksum calculating on lagre files

Thomas Heller theller at python.net
Tue Feb 8 13:12:45 EST 2005


Nick Craig-Wood <nick at craig-wood.com> writes:

> Ola Natvig <ola.natvig at infosense.no> wrote:
>>  Hi all
>> 
>>  Does anyone know of a fast way to calculate checksums for a large file. 
>>  I need a way to generate ETag keys for a webserver, the ETag of large 
>>  files are not realy nececary, but it would be nice if I could do it. I'm 
>>  using the python hash function on the dynamic generated strings (like in 
>>  page content) but on things like images I use the shutil's 
>>  copyfileobject function and the hash of a fileobject's hash are it's 
>>  handlers memmory address.
>> 
>>  Does anyone know a python utility which is possible to use, perhaps 
>>  something like the md5sum utility on *nix systems.
>
> Here is an implementation of md5sum in python.  Its the same speed
> give or take as md5sum itself.  This isn't suprising since md5sum is
> dominated by CPU usage of the MD5 routine (in C in both cases) and/or
> io (also in C).

Your code won't work correctly on Windows, since you have to open files
with mode 'rb'.

But there's a perfect working version in the Python distribution already:
tools/Scripts/md5sum.py

Thomas



More information about the Python-list mailing list