program to generate data helpful in finding duplicate large files

Steven D'Aprano steve+comp.lang.python at pearwood.info
Fri Sep 19 01:45:51 EDT 2014


David Alban wrote:

> *#!/usr/bin/python*
> 
> *import argparse*
> *import hashlib*
> *import os*
> *import re*
> *import socket*
> *import sys*

Um, how did you end up with leading and trailing asterisks? That's going to
stop your code from running.


> *from stat import **

"import *" is slightly discouraged. It's not that it's bad, per se, it's
mostly designed for use at the interactive interpreter, and it can lead to
a few annoyances if you don't know what you are doing. So be careful of
using it when you don't need to.


[...]
> *start_directory = re.sub( '/+$', '', args.start_directory )*

I don't think you need to do that, and you certainly don't need to pull out
the nuclear-powered bulldozer of regular expressions just to crack the
peanut of stripping trailing slashes from a string.

start_directory = args.start_directory.rstrip("/")

ought to do the job.

[...]
> *    f = open( file_path, 'r' )*
> *    md5sum = md5_for_file( f )*

You never close the file, which means Python will close it for you, when it
is good and ready. In the case of some Python implementations, that might
not be until the interpreter shuts down, which could mean that you run out
of file handles!

Better is to explicitly close the file:

    f = open(file_path, 'r')
    md5sum = md5_for_file(f)
    f.close()

or if you are using a recent version of Python and don't need to support
Python 2.4 or older:

    with open(file_path, 'r') as f:
        md5sum = md5_for_file(f)

(The "with" block automatically closes the file when you exit the indented
block.)

> *    sep = ascii_nul*

Seems a strange choice of a delimiter.

> *    print "%s%c%s%c%d%c%d%c%d%c%d%c%s" % ( thishost, sep, md5sum, sep,
> dev, sep, ino, sep, nlink, sep, size, sep, file_path )*

Arggh, my brain! *wink*

Try this instead:

    s = '\0'.join([thishost, md5sum, dev, ino, nlink, size, file_path])
    print s

> *exit( 0 )*

No need to explicitly call sys.exit (just exit won't work) at the end of
your code. If you exit by falling off the end of your program, Python uses
a exit code of zero. Normally, you should only call sys.exit to:

- exit with a non-zero code;

- to exit early.



-- 
Steven




More information about the Python-list mailing list