[perl-python] a program to delete duplicate files
Christos TZOTZIOY Georgiou
tzot at sil-tec.gr
Fri Mar 11 17:48:19 EST 2005
On Fri, 11 Mar 2005 11:07:02 -0800, rumours say that David Eppstein
<eppstein at ics.uci.edu> might have written:
>More seriously, the best I can think of that doesn't use a strong slow
>hash would be to group files by (file size, cheap hash) then compare
>each file in a group with a representative of each distinct file found
>among earlier files in the same group -- that leads to an average of
>about three reads per duplicated file copy: one to hash it, and two for
>the comparison between it and its representative (almost all of the
>comparisons will turn out equal but you still need to check unless you
>use a strong hash).
The code I posted in another thread (and provided a link in this one) does
exactly that (a quick hash of the first few K before calculating the whole
file's md5 sum). However, Patrick's code is faster, reading only what's
necessary (he does what I intended to do, but I was too lazy-- I actually
rewrote from scratch one of the first programs I wrote in Python, which
obviously was too amateurish code for me to publish :)
It seems your objections are related to Xah Lee's specifications; I have no
objections to your objections (-:) other than that we are just trying to produce
something of practical value out of an otherwise doomed thread...
--
TZOTZIOY, I speak England very best.
"Be strict when sending and tolerant when receiving." (from RFC1958)
I really should keep that in mind when talking with people, actually...
More information about the Python-list
mailing list