md5 and large files

Nelson Minar nelson at monkey.org
Tue Oct 19 18:56:45 EDT 2004


Tobias Pfeiffer <me at privacy.net> writes:
> I have no clue about what the md5 algorithm works like, but I'd
> think one could prove that with an number large enough, every hash
> occurs twice. At last, md5 is not random.

MD5 is random, and a very strong form of random at that. But yes,
the same 128 bit hash will occur for different inputs. The trick is
that it's nearly impossible to construct an input that produces a
given hash, or even to produce collisions. For more, see here:
  http://en.wikipedia.org/wiki/Md5

PS: I was wrong when I said that a native md5sum would be
significantly faster than Python version. It is a bit faster, but not
much. Both the native program and the Python program are going to
spend all their CPU time in the native-code MD5 calculation function.




More information about the Python-list mailing list