md5 and large files

Peter Hickman peter at semantico.com
Mon Oct 18 11:04:36 EDT 2004


Tobias Pfeiffer wrote:
> I'd say there are quite a lot of 2 GB files that 
> produce the same md5 hash...

If you were concerned that two large files of the exact same size might be 
different but produce the same MD5 then please mail me copies of these files! 
Proof of an MD5 collision, which must happen, is rarely seen in the wild. But if 
you are still concerned then write a program that reads each file in in 32k 
chunks (for example) and creates an MD5 of each chunk and compares them. If the 
files are identical then they will match chunk for chunk.

Trouble is that you are now calculating the MD5 for two files rather than one 
and comparing it to a known value



More information about the Python-list mailing list