fdups: calling for beta testers
Patrick Useldinger
pu at luxemburg.lu
Sat Feb 26 04:43:21 EST 2005
John Machin wrote:
> (1) It's actually .bz2, not .bz (2) Why annoy people with the
> not-widely-known bzip2 format just to save a few % of a 12KB file?? (3)
> Typing that on Windows command line doesn't produce a useful result (4)
> Haven't you heard of distutils?
(1) Typo, thanks for pointing it out
(2)(3) In the Linux world, it is really popular. I suppose you are a
Windows user, and I haven't given that much thought. The point was not
to save space, just to use the "standard" format. What would it be for
Windows - zip?
(4) Never used them, but are very valid point. I will look into it.
> (6) You are keeping open handles for all files of a given size -- have
> you actually considered the possibility of an exception like this:
> IOError: [Errno 24] Too many open files: 'foo509'
(6) Not much I can do about this. In the beginning, all files of equal
size are potentially identical. I first need to read a chunk of each,
and if I want to avoid opening & closing files all the time, I need them
open together.
What would you suggest?
> Once upon a time, max 20 open files was considered as generous as 640KB
> of memory. Looks like Bill thinks 512 (open files, that is) is about
> right these days.
Bill also thinks it is normal that half of service pack 2 lingers twice
on a harddisk. Not sure whether he's my hero ;-)
> (7)
> Why sort? What's wrong with just two lines:
>
> ! for size, file_list in self.compfiles.iteritems():
> ! self.comparefiles(size, file_list)
(7) I wanted the output to be sorted by file size, instead of being
random. It's psychological, but if you're chasing dups, you'd want to
start with the largest ones first. If you have more that a screen full
of info, it's the last lines which are the most interesting. And it will
produce the same info in the same order if you run it twice on the same
folders.
> (8) global
> MIN_FILESIZE,MAX_ONEBUFFER,MAX_ALLBUFFERS,BLOCKSIZE,INODES
>
> That doesn't sit very well with the 'everything must be in a class'
> religion seemingly espoused by the following:
(8) Agreed. I'll think about that.
> (9) Any good reason why the "executables" don't have ".py" extensions
> on their names?
(9) Because I am lazy and Linux doesn't care. I suppose Windows does?
> All in all, a very poor "out-of-the-box" experience. Bear in mind that
> very few Windows users would have even heard of bzip2, let alone have a
> bzip2.exe on their machine. They wouldn't even be able to *open* the
> box.
As I said, I did not give Windows users much thought. I will improve this.
> And what is "chown" -- any relation of Perl's "chomp"?
chown is a Unix command to change the owner or the group of a file. It
has to do with controlling access to the file. It is not relevant on
Windows. No relation to Perl's chomp.
Thank you very much for your feedback. Did you actually run it on your
Windows box?
-pu
More information about the Python-list
mailing list