file read, binary or text mode

Roel Schroeven rschroev_nospam_ml at fastmail.fm
Fri Sep 24 18:06:12 EDT 2004


Terry Reedy wrote:

> "Askari" <askari at addressNonValide.com> wrote in message 
> news:Xns956E4CDA892D7askariaddressNonVali at 207.35.177.135...
> 
>>"Guyon Morée" <gumuz at NO_looze_SPAM.net> wrote in
>>news:41540121$0$3891$4d4ebb8e at news.nl.uu.net:
>>
>>"rb" and "r" on a text file is the same if your text file have ascii
>>caractere (8bit) but it's not the same for Unicode caractere (16 bit).
>>Bref, if you sure that your file is ONLY text, use "r",  else, use always
>>"rb".  And "r" don't read the control caractere other that "\n" "\t" .. 
>>etc
> 
> 
> Newbies, ignore this confusion.
> 
> On Windows, text mode autoconverts \r\n to \n on input and viceverse on 
> output.  I believe that that is all the difference.  Period.

It's the main difference, but not the only thing. From the MSDN 
documentation on fopen:

"t

Open in text (translated) mode. In this mode, CTRL+Z is interpreted as 
an end-of-file character on input. In files opened for reading/writing 
with "a+", fopen checks for a CTRL+Z at the end of the file and removes 
it, if possible. This is done because using fseek and ftell to move 
within a file that ends with a CTRL+Z, may cause fseek to behave 
improperly near the end of the file.

Also, in text mode, carriage return–linefeed combinations are translated 
into single linefeeds on input, and linefeed characters are translated 
to carriage return–linefeed combinations on output. When a Unicode 
stream-I/O function operates in text mode (the default), the source or 
destination stream is assumed to be a sequence of multibyte characters. 
Therefore, the Unicode stream-input functions convert multibyte 
characters to wide characters (as if by a call to the mbtowc function). 
For the same reason, the Unicode stream-output functions convert wide 
characters to multibyte characters (as if by a call to the wctomb 
function)."

So there's
- the line endings translation
- the issue of CTRL-Z as end of file that gets stripped (CTRL-Z is 
decimal 26 or hex 1a, consistent with Ralf's mail)
- the Unicode issue, which I frankly don't understand

-- 
"Codito ergo sum"
Roel Schroeven



More information about the Python-list mailing list