fdups: calling for beta testers

Patrick Useldinger pu at luxemburg.lu
Sat Feb 26 04:43:21 EST 2005


John Machin wrote:

> (1) It's actually .bz2, not .bz (2) Why annoy people with the
> not-widely-known bzip2 format just to save a few % of a 12KB file?? (3)
> Typing that on Windows command line doesn't produce a useful result (4)
> Haven't you heard of distutils?

(1) Typo, thanks for pointing it out
(2)(3) In the Linux world, it is really popular. I suppose you are a 
Windows user, and I haven't given that much thought. The point was not 
to save space, just to use the "standard" format. What would it be for 
Windows - zip?
(4) Never used them, but are very valid point. I will look into it.

> (6) You are keeping open handles for all files of a given size -- have
> you actually considered the possibility of an exception like this:
> IOError: [Errno 24] Too many open files: 'foo509'

(6) Not much I can do about this. In the beginning, all files of equal 
size are potentially identical. I first need to read a chunk of each, 
and if I want to avoid opening & closing files all the time, I need them 
open together.
What would you suggest?

> Once upon a time, max 20 open files was considered as generous as 640KB
> of memory. Looks like Bill thinks 512 (open files, that is) is about
> right these days.

Bill also thinks it is normal that half of service pack 2 lingers twice 
on a harddisk. Not sure whether he's my hero ;-)

> (7)
> Why sort? What's wrong with just two lines:
> 
> ! for size, file_list in self.compfiles.iteritems():
> !     self.comparefiles(size, file_list)

(7) I wanted the output to be sorted by file size, instead of being 
random. It's psychological, but if you're chasing dups, you'd want to 
start with the largest ones first. If you have more that a screen full 
of info, it's the last lines which are the most interesting. And it will 
produce the same info in the same order if you run it twice on the same 
folders.

> (8)     global
> MIN_FILESIZE,MAX_ONEBUFFER,MAX_ALLBUFFERS,BLOCKSIZE,INODES
> 
> That doesn't sit very well with the 'everything must be in a class'
> religion seemingly espoused by the following:

(8) Agreed. I'll think about that.

> (9) Any good reason why the "executables" don't have ".py" extensions
> on their names?

(9) Because I am lazy and Linux doesn't care. I suppose Windows does?

> All in all, a very poor "out-of-the-box" experience. Bear in mind that
> very few Windows users would have even heard of bzip2, let alone have a
> bzip2.exe on their machine. They wouldn't even be able to *open* the
> box.

As I said, I did not give Windows users much thought. I will improve this.

> And what is "chown" -- any relation of Perl's "chomp"?

chown is a Unix command to change the owner or the group of a file. It 
has to do with controlling access to the file. It is not relevant on 
Windows. No relation to Perl's chomp.

Thank you very much for your feedback. Did you actually run it on your 
Windows box?

-pu



More information about the Python-list mailing list