[Python-Dev] [python] Re: New lines, carriage returns, and Windows

Michael Foord fuzzyman at voidspace.org.uk
Sat Sep 29 20:35:53 CEST 2007


Terry Reedy wrote:
> "Michael Foord" <fuzzyman at voidspace.org.uk> wrote in message 
> news:46FE6F92.40601 at voidspace.org.uk...
> | Guido van Rossum wrote:
>
> [snip first part of nice summary of Python i/o model]
>
> | > The other translation deals with line endings. Upon input, any of
> | > \r\n, \r, or \n is translated to a single \n by default (this is nhe 
> [sic]
> | > "universal newlines" algorithm from Python 2.x). This can be tweaked
> | > or disabled. Upon output, \n is translated into a platform specific
> | > string chosen from \r\n, \r, or \n. This can also be disabled or
> | > overridden. Note that \r, when written, is never treated specially; if
> | > you want special processing for \r on output, you can write your own
> | > translation layer.
>
> | So the question is, that when a string containing '\r\n' is written to a
> | file in text mode on a Windows platform, should it be written with the
> | encoded representation of '\r\n' or '\r\r\n'?
>
> I think Guido pretty clearly said that on output, the default behavior is 
> that \r is nothing special.  If you want a special case exception, write a 
> special case translator. +1 from me.
>
> To propose otherwise is to propose that the default semantic meaning of 
> Python text objects depend on the platform that it might be 
> output-translated for.  I believe the point of universal newline support 
> was to get away from this.
>
> | Purity would dictate the latter and practicality the former (IMO)...
>
> I disagree.  Special case exceptions complicate both learnability and code 
> readability and maintainability.  Simplicity is practicality.  The symmetry 
> of 'platform-line-endings =input> \n =output> plaform-line-endings' is both 
> pure and practical.
>
> | However, that would mean that round tripping a string would change it
> | ('\r\n' would be written as '\r\n' and then read as '\n')
>
> Whereas \r\r\n would be read back as \r\n, which is what should happen. 
> Round-trip-ability is practical to me.
>
> | - on the other
> | hand (particularly given that we are treating the data as text and not a
> | binary blob) I don't see how writing '\r\r\n' would ever actually be
> | useful in text.
>
> There are two normal ways for internal Python text to have \r\n:
> 1. Read from a file with \r\r\n.  Then \r\r\n is correct output (on the 
> same platform).
> 2. Intentially put there by a programmer.  If s/he also chooses default \n 
> translation on output, \r<translation of \n> is correct.
>   
Actually, I usually get these strings from Windows UI components. A file 
containing '\r\n' is read in with '\r\n' being translated to '\n'. New 
user input is added containing '\r\n' line endings. The file is written 
out and now contains a mix of '\r\n' and '\r\r\n'.

Michael




More information about the Python-Dev mailing list