open() in binary vs. text mode

John Machin sjmachin at lexicon.net
Fri Mar 21 20:27:46 EST 2003


bobnotbob at byu.edu (Bob Roberts) wrote in message news:<c4e6b17d.0303211145.696c10ce at posting.google.com>...
> 
> When in windows, reading in text mode, if it came across ASCII
> character 26, it would quit and not read any more of the file.  This
> does not happen on other platforms or on windows when reading in
> binary mode.
> 
> Why would a specific character cause this behavior?

Ctrl-Z is treated as end-of-file. The behaviour is inherited from CP/M
via MS-DOS, as was use of CRLF as line terminator. CP/M files were a
whole number of 128-byte sectors. The convention was that in files
containing text, the actual text was terminated by ctrl-Z, and the
remainder of the sector (usually) padded out with NULs. The "stdio"
kits for C compilers on CP/M, MS-DOS & Windows treat input ctrl-Z as
EOF. I.e. this is not a Python-only feature.

Unfortunately many applications don't apply elementary validations
(like "names shouldn't contain control characters"), so one can be
supplied with files with embedded ctrl-Zs (typically a typo for
shift-Z). Consequently one needs to be ctrl-Z-aware; paranoid
programmers read data files in binary mode and validate their
contents.




More information about the Python-list mailing list