[Python-Dev] [python] Re: New lines, carriage returns, and Windows

Michael Foord fuzzyman at voidspace.org.uk
Sat Sep 29 17:30:26 CEST 2007


Guido van Rossum wrote:
> [snip..]
> Python *does* have its own I/O model. There are binary files and text
> files. For binary files, you write bytes and the semantic model is
> that of an array of bytes; byte indices are seek positions.
>
> For text files, the contents is considered to be Unicode, encoded as
> bytes in a binary file. So text file always has an underlying binary
> file. Two translations take place, both of which have defaults varying
> by platform. One translation is encoding Unicode text into bytes upon
> output, and decoding bytes to Unicode text upon input. This can use
> any encoding supported by the encodings package.
>
> The other translation deals with line endings. Upon input, any of
> \r\n, \r, or \n is translated to a single \n by default (this is nhe
> "universal newlines" algorithm from Python 2.x). This can be tweaked
> or disabled. Upon output, \n is translated into a platform specific
> string chosen from \r\n, \r, or \n. This can also be disabled or
> overridden. Note that \r, when written, is never treated specially; if
> you want special processing for \r on output, you can write your own
> translation layer.
>   
So the question is, that when a string containing '\r\n' is written to a 
file in text mode on a Windows platform, should it be written with the 
encoded representation of '\r\n' or '\r\r\n'?

Purity would dictate the latter and practicality the former (IMO)...

However, that would mean that round tripping a string would change it 
('\r\n' would be written as '\r\n' and then read as '\n') - on the other 
hand (particularly given that we are treating the data as text and not a 
binary blob) I don't see how writing '\r\r\n' would ever actually be 
useful in text.

+1 on just writing '\r\n' from me.

Michael Foord
http://www.manning.com/foord


> That's all. There is nothing unimplementable or confusing in these
> specifications.
>
> Python doesn't care about record I/O on legacy OSes; it does care
> about variability found in practice between popular OSes.
>
> Note that \r, \n and friends in Python 3000 are either ASCII (in bytes
> literals) or Unicode (in text literals). Again, no support for legacy
> systems that don't use ASCII or a superset.
>
> Legacy OSes are called that for a reason.
>
>   



More information about the Python-Dev mailing list