a program to delete duplicate files

John Machin sjmachin at lexicon.net
Sat Mar 12 03:51:59 EST 2005


David Eppstein wrote:
> In article <4232079b$1 at news.vo.lu>,
>  Patrick Useldinger <pu.news.001 at gmail.com> wrote:
>
> > > Well, but the spec didn't say efficiency was the primary
criterion, it
> > > said minimizing the number of comparisons was.
> >
> > That's exactly what my program does.
>
> If you're doing any comparisons at all, you're not minimizing the
number
> of comparisons.

Gah. You two should be lawyers. I don't know why you're arguing about
whatever you are arguing about.

Just look at the efficiency of processing N files of the same size S,
where they differ after d bytes: [If they don't differ, d = S]

PU: O(Nd) reading time, O(Nd) data comparison time [Actually (N-1)d
which is important for small N and large d].

Hashing method: O(NS) reading time, O(NS) hash calc time

1. d <= S
2. Anyone have a usable hash function that's faster than string
comparison?
3. Anyone have an answer to PU's request for advice where his algorithm
could be improved by using a hash method?

Temporary injunction in favour of PU.




More information about the Python-list mailing list