'r' vs 'rb' in csv (was Re: Python SHA-1 as a method for unique file identification ? [help!])

Mon Jun 26 16:39:22 EDT 2006

Tim Peters wrote:
> [EP <eric.pederson at gmail.com>]
> > This inquiry may either turn out to be about the suitability of the
> > SHA-1 (160 bit digest) for file identification, the sha function in
> > Python ... or about some error in my script
>
> It's your script.  Always open binary files in binary mode.  It's a
> disaster on Windows if you don't (if you open a file in text mode on
> Windows, the OS pretends that EOF occurs at the first instance of byte
> chr(26) -- this is an ancient Windows behavior that made an odd kind
> of sense in the mists of history, and has persisted in worship of
> Backward Compatibility despite that the original reason for it went
> away _long_ ago).

On a semi-related note, I have a database on Linux that imports from a
Macintosh CSV file.  The 'csv' module says to always open files in
binary mode, but this didn't work in my case: I had to open it as 'rU'
(text with universal newlines) or 'csv' misparsed it.  I'd like the
program to be portable to Windows and Mac.  Is there a way around this?
 Will I really burn in hell for using 'rU'?

What was the odd bit of sense?  I know you end console input by typing
ctrl-Z, but I thought it was just like Unix ctrl-D which ends the input
but doesn't actually insert that character.

--
Mike Orr <sluggoster at gmail.com>