comparing multiple copies of terrabytes of data?

Istvan Albert ialbert at mailblocks.com
Tue Oct 26 09:50:08 EDT 2004


Josiah Carlson wrote:

> The code to do so is simple:

...
 >     p = -1
 >     good = 1
 >     while f1.tell() < p:
 >         p = f1.tell()
 >         if f1.read(b) == f2.read(b) == f3.read(b):
 >             continue

...

What is slightly amusing is that your *simple*
solution is actually incorrect. You got the
comparison backwards in the while loop.

Other functional deficiency when compared to
the cmp diffs is that you don't know which
file has changed or which byte differs
...  adding that brings about the potential
for another set of bugs. Then someone else comes along
who knows a little less about python and adds
a little feature to the program that actually
silently breaks it ...

Whether or not it is actually faster remains
to be seen. And that was my whole point,
not to don't dismiss cmp to soon, see how it works
test it, then armed with some real numbers one
can make better decisions.

Istvan.



More information about the Python-list mailing list