fdups: calling for beta testers

John Machin sjmachin at lexicon.net
Sat Feb 26 21:06:32 EST 2005


On Sat, 26 Feb 2005 23:53:10 +0100, Patrick Useldinger
<pu.news.001 at gmail.com> wrote:

> I've tested it intensively

"Famous Last Words" :-)

>Thanks for your feedback!

Here's some more:

(1) Manic s/w producing lots of files all the same size: the Borland
C[++] compiler produces a debug symbol file (.tds) that's always
384KB; I have 144 of these on my HD, rarely more than 1 in the same
directory.

Here's a snippet from a duplicate detection run:

DUP|393216|2|\devel\delimited\build\lib.win32-1.5\delimited.tds|\devel\delimited\build\lib.win32-2.1\delimited.tds
DUP|393216|2|\devel\delimited\build\lib.win32-2.3\delimited.tds|\devel\delimited\build\lib.win32-2.4\delimited.tds

(2) There appears to be a flaw in your logic such that it will find
duplicates only if they are in the *SAME* directory and only when
there are no other directories with two or more files of the same
size. The above duplicates were detected only when I made the
following changes to your script:


--- fdups       Sat Feb 26 06:41:36 2005
+++ fdups_jm.py Sun Feb 27 12:18:04 2005
@@ -29,13 +29,14 @@
         self.count = self.totalsize     = self.inodecount =
self.slinkcount = 0
         self.gain  = self.bytescompared = self.bytesread  =
self.inodecount = 0
         for toplevel in args:
-            os.path.walk(toplevel, self.buildList, None)
+            os.path.walk(toplevel, self.updateDict, None)
         if self.count > 0:
             self.compare()

-    def buildList(self,arg,dirpath,namelist):
-        """ build a dictionnary of files to be analysed, indexed by
length """
-        files = {}
+    def updateDict(self,arg,dirpath,namelist):
+        """ update a dictionary of files to be analysed, indexed by
length """
+        # files = {}
+        files = self.compfiles
         for filepath in namelist:
             fullpath = os.path.join(dirpath,filepath)
             if os.path.isfile(fullpath):
@@ -51,20 +52,23 @@
                         if  size >= MIN_FILESIZE:
                             self.count += 1
                             self.totalsize += size
+                            # is above totalling in the wrong place?
                             if size not in files:
                                 files[size]=[fullpath]
                             else:
                                 files[size].append(fullpath)
-        for size in files:
-            if len(files[size]) != 1:
-                self.compfiles[size]=files[size]
+        # for size in files:
+        #     if len(files[size]) != 1:
+        #         self.compfiles[size]=files[size]

     def compare(self):
         """ compare all files of the same size  - outer loop """
         sizes=self.compfiles.keys()
         sizes.sort()
         for size in sizes:
-            self.comparefiles(size,self.compfiles[size])
+            list_of_filenames = self.compfiles[size]
+            if len(list_of_filenames) > 1:
+               self.comparefiles(size, list_of_filenames)

     def comparefiles(self,size,filelist):
         """ compare all files of the same size  - inner loop """


(3) Your fdups-check gadget doesn't work on Windows; the commands
module works only on Unix but is supplied with Python on all
platforms. The results might just confuse a newbie:

(1, "'{' is not recognized as an internal or external
command,\noperable program or batch file.")

Why not use the Python filecmp module?

Cheers,
John



More information about the Python-list mailing list