Slight discrepancy with filecmp.cmp

John Machin sjmachin at lexicon.net
Mon Apr 18 01:21:44 EDT 2005


On Sun, 17 Apr 2005 22:06:04 -0600, Ivan Van Laningham
<ivanlan at pauahtun.org> wrote:
[snip]
> So I wrote a set of
>programs to both index the disk versions with the cd versions, and to
>compare, using filecmp.cmp(), the cd and disk version.  Works fine. 
>Turned up several dozen files that had been inadvertantly rotated or
>saved with the wrong quality, various fat-fingered mistakes like that.
>
>However, it didn't flag the files that I know have bitrot.  I seem to
>remember that diff uses a checksum algorithm on binary files, not a
>byte-by-byte comparison.  Am I wrong?  

According to the docs:

"""
cmp( f1, f2[, shallow[, use_statcache]]) 

Compare the files named f1 and f2, returning True if they seem equal,
False otherwise. 
Unless shallow is given and is false, files with identical os.stat()
signatures are taken to be equal
"""

and what is an os.stat() signature, you ask? So did I.

According to the code itself:

def _sig(st):
    return (stat.S_IFMT(st.st_mode),
            st.st_size,
            st.st_mtime)

Looks like it assumes two files are the same if they are of the same
type, same size, and same time-last-modified. Normally I guess that's
good enough, but maybe the phantom bit-toggler is bypassing the file
system somehow. What OS are you running?

You might like to do two things: (1) run your comparison again with
shallow=False (2) submit a patch to the docs.

(-:
You have of course attempted to eliminate other variables by checking
that the bit-rot effect is apparent using different display software,
a different computer, an observer who's not on the same medication as
you, ... haven't you?
:-)


HTH,
John




More information about the Python-list mailing list