Efficient MD5 (or similar) hashes

Bengt Richter bokr at oz.net
Sun Dec 7 22:21:46 EST 2003


On Sun, 07 Dec 2003 19:49:58 -0500, Kamus of Kadizhar <yan at NsOeSiPnAeMr.com> wrote:

>ANother newbie question:
>
>I have large files I'm dealing with.  Some 600MB -1.2 GB in size, over a 
>slow network.  Transfer of one of these files can take 40 minutes or an 
>hour.
>
>I want to check the integrity of the files after transfer.  I can check 
>the obvious - date, file size - quickly, but what if I want an MD5 hash?
>
> From reading the python docs, md5 reads the entire file as a string.
I don't know what docs you're reading, but if your read the docs on the
md5 module, you'll see you don't have to do that.
also you could interactively type help('md5')
or import md5 followed by help(md5)

>That's not practical on a 1 GB file that's network mounted.
Well, whatever calculates the md5 will have to read all the bytes from the source
you want to check. If you have downloaded a file to another machine, then
the fastest will be to run the md5 calculation there, but if you have a gigabit lan
connection and things aren't busy, IWT it wouldn't make much difference if you
read it that way.

If you have a c/c++ excutable utility that will calculate md5, it will probably
be fastest to run that on the file. You can run it from python via popen, if that's
the context you want to control it from.

I think there's ways to RPC to accomplish the same remotely, but I haven't played with that.

>
>The only thing I can think of is to set up an inetd daemon on the server 
>that will spit out the md5 hash if given the file path/name.
>
>Any other ideas?

Describe your setup in a little more detail. Someone has probably done it before.

Regards,
Bengt Richter




More information about the Python-list mailing list