sorting with expensive compares?

Steve Holden steve at holdenweb.com
Fri Dec 23 13:42:04 EST 2005


bonono at gmail.com wrote:
> Dan Stromberg wrote:
[...]
>>I've been using the following compare function, which in short checks, in
>>order:
>>
>>1) device number
>>2) inode number
>>3) file length
>>4) the beginning of the file
>>5) an md5 hash of the entire file
>>6) the entire file
[...]
> Why would #5 not enough as an indicator that the files are indentical ?
> 
Because it doesn't guarantee that the files are identical. It indicates, 
to a very high degree of probability (particularly when the file lengths 
are equal), that the two files are the same, but it doesn't guarantee it.

Technically there are in infinite number of inputs that can produce the 
same md5 hash.

regards
  Steve
-- 
Steve Holden       +44 150 684 7255  +1 800 494 3119
Holden Web LLC                     www.holdenweb.com
PyCon TX 2006                  www.python.org/pycon/




More information about the Python-list mailing list