[Python-Dev] New lines, carriage returns, and Windows
Nick Maclaren
nmm1 at cus.cam.ac.uk
Sat Sep 29 20:48:20 CEST 2007
"Guido van Rossum" <guido at python.org> wrote:
>
> Have you looked at Py3k at all, especially PEP 3116 (new I/O)?
No.
> Python *does* have its own I/O model. There are binary files and text
> files. For binary files, you write bytes and the semantic model is
> that of an array of bytes; byte indices are seek positions.
That is the same model as C and Unix. It is text files that we are
discussing.
> For text files, the contents is considered to be Unicode, encoded as
> bytes in a binary file. So text file always has an underlying binary
> file. Two translations take place, both of which have defaults varying
> by platform. One translation is encoding Unicode text into bytes upon
> output, and decoding bytes to Unicode text upon input. This can use
> any encoding supported by the encodings package.
The character code isn't the issue here, and is almost completely
irrelevant.
> The other translation deals with line endings. Upon input, any of
> \r\n, \r, or \n is translated to a single \n by default (this is nhe
> "universal newlines" algorithm from Python 2.x). This can be tweaked
> or disabled. Upon output, \n is translated into a platform specific
> string chosen from \r\n, \r, or \n. This can also be disabled or
> overridden. Note that \r, when written, is never treated specially; if
> you want special processing for \r on output, you can write your own
> translation layer.
Grrk. That's the problem. You don't get back what you have written,
for a start, which isn't nice. There are other issues, too.
> That's all. There is nothing unimplementable or confusing in these
> specifications.
Nothing unimplementable, I agree. Nothing confusing? Not in the
experience of the users I have dealt with.
> Python doesn't care about record I/O on legacy OSes; it does care
> about variability found in practice between popular OSes.
As a short-term solution, that is fine. But I have seen the wheel
turn a couple of times in 40 years, and expect it to continue after
I am safely 6' under ....
> Note that \r, \n and friends in Python 3000 are either ASCII (in bytes
> literals) or Unicode (in text literals). Again, no support for legacy
> systems that don't use ASCII or a superset.
That's not a problem. I don't see that changing in the forseeable
future.
> Legacy OSes are called that for a reason.
Well, I remember when the text I/O model that C, Unix and Python
use WAS a feature of legacy OSs :-)
Seriously.
Regards,
Nick Maclaren,
University of Cambridge Computing Service,
New Museums Site, Pembroke Street, Cambridge CB2 3QH, England.
Email: nmm1 at cam.ac.uk
Tel.: +44 1223 334761 Fax: +44 1223 334679
More information about the Python-Dev
mailing list