python snippet request: calculate MD5 checksum on 650 MB ISO cdrom image quickly

Radovan Garabik garabik-news at spam.melkor.dnp.fmph.uniba.sk
Wed Oct 25 02:54:24 EDT 2000


Tim Peters <tim_one at email.msn.com> wrote:
 : [Warren Postma]
 :> I am writing some python scripts to manage downloading (and
 :> re-downloading) ISO images from FTP mirrors, and doing MD5 checksums
 :> on the received files to make sure they are intact.
 :>
 :> I noticed that there is an MD5 message digest module in Python.
 :> But it only accepts STRINGS.  Is there some way to pass a WHOLE FILE to
 :> it, less awkwardly than having a WHILE loop that reads 1k chunks and
 :> passes it along to the MD5 module.

 : You can read the entire file into a string at one gulp, via e.g. f.read().
 : One-liner.

ISO image? typical size of an iso image is about 650 MB :-)

 :> A new function in the md5 module, md5.md5file(filename) would be nice,
 :> if anyone is listening, for a future python 2.x release.
 :>
 :> I'll contribute a patch if anyone thinks it's a good idea.

 : Alas, I don't:  there's no magic to be had here.  Such a function will have
 : to make up its own policy for chunking the file input, and one size doesn't
 : fit all.  The "while" loop is trivial to write, and really has no bad effect
 : on speed even if written in Python (1Kb chunks are very small, btw -- why
 : not use 64Kb, or 1Mb, chunks?  the "sweet spot" on your system is something
 : you can determine (see below), but a builtin md5file method can't guess for
 : you).

the best chunk size is a multiply of filesystem block size. Though
you will not get much better performance compared to exactly
the block size.


-- 
 -----------------------------------------------------------
| Radovan Garabik http://melkor.dnp.fmph.uniba.sk/~garabik/ |
| __..--^^^--..__    garabik @ melkor.dnp.fmph.uniba.sk     |
 -----------------------------------------------------------
Antivirus alert: file .signature infected by signature virus.
Hi! I'm a signature virus! Copy me into your signature file to help me spread!



More information about the Python-list mailing list