binary file compare...

Grant Edwards invalid at invalid
Wed Apr 15 10:11:56 EDT 2009


On 2009-04-15, Martin <martin at marcher.name> wrote:
> On Wed, Apr 15, 2009 at 11:03 AM, Steven D'Aprano

> I'd still say rather burn CPU cycles than development hours (if I got
> the question right),

_Hours_?  Calling the file compare module takes  _one_line_of_code_.

Implementing a file compare from scratch takes about a half
dozen lines of code.

> if not then with binary files you will have to find some way
> of representing differences between the 2 files in a readable
> manner anyway.

 1) Who said anything about a readable representation of the
    differences?

 2) How does a checksum provide that?    

>> Hashing is a *lot* more work than just comparing two bytes.
>> The MD5 checksum has been specifically designed to be fast and
>> compact, and the algorithm is still complicated:
>
> I know that the various checksum algorithms aren't exactly
> cheap, but I do think that just to know wether 2 files are
> different a solution which takes 5mins to implement wins
> against a lengthy discussion

Bah.  A direct compare is trivial.  The discussion of which
checksum to use, how to implement it, and how reliable it is
will be far longer than any discussion over a direct
comparison.

> which optimizes too early wins hands down.

Optimizes too early?  Comparing the bytes is the simplest and
most direct, obvious solution.  It takes one line of code to
call the file compare module.  Implementing it from scratch
takes about five lines of code.

We all rail against premature optimization, but using a
checksum instead of a direct comparison is premature
unoptimization.  ;)

-- 
Grant Edwards                   grante             Yow! Hmmm ... A hash-singer
                                  at               and a cross-eyed guy were
                               visi.com            SLEEPING on a deserted
                                                   island, when ...



More information about the Python-list mailing list