python snippet request: calculate MD5 checksum on 650 MB ISO cdrom image quickly

Gareth McCaughan Gareth.McCaughan at pobox.com
Tue Oct 24 18:39:08 EDT 2000


Tim Peters wrote:

[Warren Postma:]
> > A new function in the md5 module, md5.md5file(filename) would be nice,
> > if anyone is listening, for a future python 2.x release.
> >
> > I'll contribute a patch if anyone thinks it's a good idea.
> 
> Alas, I don't:  there's no magic to be had here.  Such a function will have
> to make up its own policy for chunking the file input, and one size doesn't
> fit all.  The "while" loop is trivial to write, and really has no bad effect
> on speed even if written in Python (1Kb chunks are very small, btw -- why
> not use 64Kb, or 1Mb, chunks?  the "sweet spot" on your system is something
> you can determine (see below), but a builtin md5file method can't guess for
> you).
..
> Why the emphasis on "SLOWLY" and "slow"?  You may be missing that md5 is
> *designed* to be slow <0.9 wink>!  It's supposed to give *such* a good hash
> that it's computationally intractable to fool it on purpose, and it does a
> lot of work to achieve that.  If you want a *faster* checksum, then e.g. use
> crc32 instead.

If it doesn't matter how slow MD5ing a file is, where's the
harm in providing an md5file function that might be 20% too
slow?[1] If it *does* matter how slow MD5ing a file is, then
note that most people will probably just guess at a buffer
size and use that, and it'll often be slower for them than
a reasonably sensible centrally-made guess would be.

So I'd be in favour of an md5file function. It would be
trivial to implement, it would probably reduce the total
number of cycles wasted while doing MD5s, it would save
lots of people writing a couple of unnecessary lines of
code each, and those who need optimum tuning can carry on
using what's already there. I see no downside.


[1] Number pulled out of thin air. I'd expect a smaller
    difference, actually.

-- 
Gareth McCaughan  Gareth.McCaughan at pobox.com
sig under construc



More information about the Python-list mailing list