md5 and large files

Tobias Pfeiffer me at privacy.net
Mon Oct 18 13:18:45 EDT 2004


Hi!

On 18 Okt 2004, Peter L Hansen wrote:
> Tobias Pfeiffer wrote:
>> This is not true. I'd say there are quite a lot of 2 GB files that 
>> produce the same md5 hash...
> 
> Without deliberately contriving an example using the recently
> discovered technique, can you offer even a single example? ;-)

Of course I can't... *grin* -- But actually, (correct me if I'm wrong) an 
MD5 sum is 128 bits long, that are 2^128 different possibilities. Now a 2 
GB file has 8*2*1024^3 bits, that are 2^17179869184 different 
possibilities for a 2 GB file. Am I wright thinking that the number of 
files with an identical md5 sum is now 2^17179869184 / 2^128 = 
2^17179869056?

> (If you were trying to point out that 100.00000000000% or whatever
> is not possible with MD5, okay, but note that Roger didn't specify
> the precision.  100% is close enough to what you'd get with MD5.)

And am I also right thinking that the possibility of getting the same md5 
for two files is then 2^-128? OK, I think I have to admit that this 
chance is small enough... *grin*

> Simple string comparisons with *what*?  Are you assuming that there
> is a known-good copy of the file sitting right next to it, that he
> can compare against?

That was why I said he has to know the goal of his project. I think if he 
wants to compare the first 4K of the files, he also has to have a known-
good copy.

Bye
Tobias

-- 
please send any mail to botedesschattens(at)web(dot)de



More information about the Python-list mailing list