md5 and large files

Tobias Pfeiffer me at privacy.net
Mon Oct 18 09:10:26 EDT 2004


Hi!

On 18 Okt 2004, Roger Binns wrote:

> Brad Tilley wrote:
>> I would like to verify that the files are not corrupt so what's the
>> most efficient way to calculate md5 sums on 4GB files? The machine
>> doing the calculations is a small desktop with 256MB of RAM.
> 
> If you need to be 100% certain, then only doing md5sum over the
> entire file will work as Tim points out.

This is not true. I'd say there are quite a lot of 2 GB files that 
produce the same md5 hash...

I think he has to see what he really wants to do with that file. If the 
goal is "compute the md5sum", then a loop with md5.update() seems most 
appropriate to me. If the goal is "check the equality" or "check whether 
they are corrupted", why md5? He can just read small blocks from the file 
and then do a simple string comparison. Might even be faster. And here, 
the chance is really close to 100% he'd notice a change in the files. :-)

Bye
Tobias

-- 
please send any mail to botedesschattens(at)web(dot)de



More information about the Python-list mailing list