Why checksum? [was Re: Fuzzy Lookups]

Steven D'Aprano steve at REMOVETHIScyber.com.au
Wed Feb 1 05:59:21 EST 2006


On Tue, 31 Jan 2006 13:38:50 -0800, Paul Rubin wrote:

> Steven D'Aprano <steve at REMOVETHIScyber.com.au> writes:
>> This isn't a criticism, it is a genuine question. Why do people compare
>> local files with MD5 instead of doing a byte-to-byte compare? Is it purely
>> a caching thing (once you have the checksum, you don't need to read the
>> file again)? Are there any other reasons?
> 
> It's not just a matter of comparing two files.  The idea is you have
> 10,000 local files and you want to find which ones are duplicates
> (i.e. if files 637 and 2945 have the same contents, you want to
> discover that).  The obvious way is make a list of hashes, and sort
> the list.

Sure. But if you are just comparing two files, is there any reason to
bother with a checksum? (MD5 or other.)

I can't see any, but I thought maybe that's because I'm not thinking
outside the box.


-- 
Steven.




More information about the Python-list mailing list