binary file compare...

Peter Otten __peter__ at web.de
Mon Apr 13 17:25:37 EDT 2009


Grant Edwards wrote:

> On 2009-04-13, Grant Edwards <invalid at invalid> wrote:
>> On 2009-04-13, SpreadTooThin <bjobrien62 at gmail.com> wrote:
>>
>>> I want to compare two binary files and see if they are the same.
>>> I see the filecmp.cmp function but I don't get a warm fuzzy feeling
>>> that it is doing a byte by byte comparison of two files to see if they
>>> are they same.
>>
>> Perhaps I'm being dim, but how else are you going to decide if
>> two files are the same unless you compare the bytes in the
>> files?
>>
>> You could hash them and compare the hashes, but that's a lot
>> more work than just comparing the two byte streams.
>>
>>> What should I be using if not filecmp.cmp?
>>
>> I don't understand what you've got against comparing the files
>> when you stated that what you wanted to do was compare the files.
> 
> Doh!  I misread your post and thought were weren't getting a
> warm fuzzying feeling _because_ it was doing a byte-byte
> compare. Now I'm a bit confused.  Are you under the impression
> it's _not_ doing a byte-byte compare?  Here's the code:
> 
> def _do_cmp(f1, f2):
>     bufsize = BUFSIZE
>     fp1 = open(f1, 'rb')
>     fp2 = open(f2, 'rb')
>     while True:
>         b1 = fp1.read(bufsize)
>         b2 = fp2.read(bufsize)
>         if b1 != b2:
>             return False
>         if not b1:
>             return True
>     
> It looks like a byte-by-byte comparison to me.  Note that when
> this function is called the file lengths have already been
> compared and found to be equal.

But there's a cache. A change of file contents may go undetected as long as
the file stats don't change:

 $ cat fool_filecmp.py
import filecmp, shutil, sys

for fn in "adb":
    with open(fn, "w") as f:
        f.write("yadda")

shutil.copystat("d", "a")
filecmp.cmp("a", "b", False)

with open("a", "w") as f:
    f.write("*****")
shutil.copystat("d", "a")

if "--clear" in sys.argv:
    print "clearing cache"
    filecmp._cache.clear()

if filecmp.cmp("a", "b", False):
    print "file a and b are equal"
else:
    print "file a and b differ"
print "a's contents:", open("a").read()
print "b's contents:", open("b").read()

$ python2.6 fool_filecmp.py
file a and b are equal
a's contents: *****
b's contents: yadda

Oops. If you are paranoid you have to clear the cache before doing the
comparison:

$ python2.6 fool_filecmp.py --clear
clearing cache
file a and b differ
a's contents: *****
b's contents: yadda

Peter



More information about the Python-list mailing list