program to generate data helpful in finding duplicate large files

Fri Sep 19 02:45:22 EDT 2014

On Fri, Sep 19, 2014 at 3:45 PM, Steven D'Aprano
<steve+comp.lang.python at pearwood.info> wrote:
> David Alban wrote:
>> *import sys*
>
> Um, how did you end up with leading and trailing asterisks? That's going to
> stop your code from running.

They're not part of the code, they're part of the mangling of the
formatting. So this isn't a code issue, it's a mailing list /
newsgroup one. David, if you set your mail/news client to send plain
text only (not rich text or HTML or formatted or anything like that),
you'll avoid these problems.

>> *    sep = ascii_nul*
>
> Seems a strange choice of a delimiter.

But one that he explained in his body :)

>> *    print "%s%c%s%c%d%c%d%c%d%c%d%c%s" % ( thishost, sep, md5sum, sep,
>> dev, sep, ino, sep, nlink, sep, size, sep, file_path )*
>
> Arggh, my brain! *wink*
>
> Try this instead:
>
>     s = '\0'.join([thishost, md5sum, dev, ino, nlink, size, file_path])
>     print s

That won't work on its own; several of the values are integers. So
either they need to be str()'d or something in the output system needs
to know to convert them to strings. I'm inclined to the latter option,
which simply means importing print_function from __future__ and
setting sep=chr(0).

>> *exit( 0 )*
>
> No need to explicitly call sys.exit (just exit won't work) at the end of
> your code.

Hmm, you sure exit won't work? I normally use sys.exit to set return
values (though as you say, it's unnecessary at the end of the
program), but I tested it (Python 2.7.3 on Debian) and it does seem to
be functional. Do you know what provides it?

ChrisA