filecmp.cmp() doesn't seem to do what it says in the documentation

Carl Banks pavlovevidence at gmail.com
Mon Sep 6 19:40:07 EDT 2010


On Sep 6, 1:11 pm, tinn... at isbd.co.uk wrote:
> Terry Reedy <tjre... at udel.edu> wrote:
> > On 9/6/2010 1:18 PM, tinn... at isbd.co.uk wrote:
> > > I'm using filecmp.cmp() to compare some files (surprise!).
>
> > > The documentation says:-
> > >      Unless shallow is given and is false, files with identical
> > >      os.stat() signatures are taken to be equal.
>
> > Reword and read carefully: if shallow == True and signatures are
> > identical, then files are taken to be equal.
>
> > Here is the corresponding code from Lib/filecmp.py:
> >      if shallow and s1 == s2:
> >          return True
>
> > Does not say the result for non-identical signatures ;-).
>
> > > I'm not setting shallow explicitly so it's True, thus the function
> > > should be comparing the os.stat() results.  However this doesn't seem
> > > to be the case as even if I touch one of the files to change it's
> > > access/modification date filecmp.cmp() still returns True.
>
> > Because it goes on to actually compare the files, and they are equal.
>
> > ...
> >      result = _cache.get((f1, f2))
> >      if result and (s1, s2) == result[:2]:
> >          return result[2]
> >      outcome = _do_cmp(f1, f2)
> >      _cache[f1, f2] = s1, s2, outcome
> >      return outcome
>
> > Most of the stdlib files in Python are quite readable. I recommend it
> > when you have questions.
>
> Well I still don't think it's what the documentation says, it would be
> much better if it told you that 'if the os.stat() signatures are not
> identical then the file contents are actually compared'.  The
> implication to me when I read the documentation was that if shallow
> was True and the os.stat() signatures were not identical then False
> would be returned.  Where does it say otehrwise?

To me, "comparing files" means to compare the contents and nothing
else, so when documentation says "Compare the files named f1 and f2" I
think it has that covered.  I understand the os.stat comparison to be
a (non-foolproof) optimization.


Anyway, if you just want to compare the os.stat parameters you should
just use os.stat.

os.stat(filename1) == os.stat(filename2)


Then if you want, you can write a function to compare only the stats
you are interested in.

def mystatcmp(filename1,filename2):
    s1 = os.stat(filename1)
    s2 = os.stat(filename2)
    return s1.st_size == s2.st_size and s1.st_mtime == s2.st_mtime


Carl Banks



More information about the Python-list mailing list