[perl-python] a program to delete duplicate files

John Bokma postmaster at castleamber.com
Fri Mar 11 00:10:35 EST 2005


David Eppstein wrote:

> In article <1110372973.657649.212920 at l41g2000cwc.googlegroups.com>,
>  "Xah Lee" <xah at xahlee.org> wrote:
> 
>> a absolute requirement in this problem is to minimize the number of
>> comparison made between files. This is a part of the spec.
> 
> You need do no comparisons between files.  Just use a sufficiently 
> strong hash algorithm (SHA-256 maybe?) and compare the hashes.

I did it as follows (some time ago):

is filesize in hash?
    	
    	calculate md5 (and store), if equal then compare
     	files.

store info in hash.

In some cases if might be faster to drop the md5 (since it reads all data)

-- 
John                   Small Perl scripts: http://johnbokma.com/perl/
               Perl programmer available:     http://castleamber.com/
            Happy Customers: http://castleamber.com/testimonials.html
                        



More information about the Python-list mailing list