signature for a file ?

John Hunter jdhunter at ace.bsd.uchicago.edu
Wed Jul 31 12:10:19 EDT 2002


>>>>> "Shagshag13" == Shagshag13  <shagshag13 at yahoo.fr> writes:

    >> files. to do this i'm planning to write a python script that
    >> >would compute a kind of CRC32, MD5 or SHA (i'm really not
    >> competent in that >- so here i need advices and pointer to some
    >> implementations - and to know >which is the best to had a
    >> unique unambiguous signature for a file) and >then use it to
    >> find "doubles" : same size + same signature = probably same
    >> >file.
    >> 
    >> That would be very useful indeed.  (Concurs another
    >> disorganized person :-)

You may be interested in the dircmp, which will give you reports on
which files are in common in dirs A and B, which files are unique to
A, which are unique to B, and so on...

  http://python.org/doc/current/lib/dircmp-objects.html

This built-in library is (naturally) much more efficient than the
script I posted because

1) It only computes sums on identical files.  No need to check sums on
   files that have no other file the same size

2) It computes the sums of identically sized files in blocks and
   compares the files blockwise.  No need to compute the entire sum if
   the first 1000 bytes differ.

3) It has a nice OO interface to get lists of common and unique files,
   and will recurs subdirs upon request.

>>> import filecmp
>>> x = filecmp.dircmp('mnet', 'mncvs')
>>> x.report_full_closure()
diff mnet mncvs
Only in mnet : ['.cvsignore', 'Broker', 'Broker.bat', 'COPYING', 'CREDITS', 'ChangeLog', 'GNUmakefile', 'MacOSX', 'artwork', 'client', 'common', 'contenttypes', 'hackerdocs', 'linux', 'localweb', 'mnmods', 'overview.txt', 'rmnlib', 'server', 'tarexclude.txt', 'utilscripts', 'win32', 'wxBroker']
Only in mncvs : ['extsrc', 'mnet']



John Hunter




More information about the Python-list mailing list