[perl-python] a program to delete duplicate files
John Bokma
postmaster at castleamber.com
Fri Mar 11 00:10:35 EST 2005
David Eppstein wrote:
> In article <1110372973.657649.212920 at l41g2000cwc.googlegroups.com>,
> "Xah Lee" <xah at xahlee.org> wrote:
>
>> a absolute requirement in this problem is to minimize the number of
>> comparison made between files. This is a part of the spec.
>
> You need do no comparisons between files. Just use a sufficiently
> strong hash algorithm (SHA-256 maybe?) and compare the hashes.
I did it as follows (some time ago):
is filesize in hash?
calculate md5 (and store), if equal then compare
files.
store info in hash.
In some cases if might be faster to drop the md5 (since it reads all data)
--
John Small Perl scripts: http://johnbokma.com/perl/
Perl programmer available: http://castleamber.com/
Happy Customers: http://castleamber.com/testimonials.html
More information about the Python-list
mailing list