\r\n or \n notepad editor end line ???

John Machin sjmachin at lexicon.net
Mon Jun 13 20:26:18 EDT 2005


Steven D'Aprano wrote:
> On Mon, 13 Jun 2005 11:53:25 +0200, Fredrik Lundh wrote:
> 
> 
>><ajikoe at gmail.com> wrote:
>>
>>
>>>It means in windows we should use 'wb' to write and 'rb' to read ?
>>>Am I right?
>>
>>no.
>>
>>you should use "wb" to write *binary* files, and "rb" to read *binary*
>>files.
>>
>>if you're working with *text* files (that is, files that contain lines of text
>>separated by line separators), you should use "w" and "r" instead, and
>>treat a single "\n" as the line separator.
> 
> 
> I get nervous when I read instructions like this. It sounds too much like
> voodoo: "Do this, because it works, never mind how or under what
> circumstances, just obey or the Things From The Dungeon Dimensions will
> suck out your brain!!!"
> 
> Sorry Fredrik :-)
> 

Many people don't appear to want to know why; they only want a solution 
to what they perceive to be their current problem.

> When you read a Windows text file using "r" mode, what happens to the \r
> immediately before the newline?

The thing to which you refer is not a "newline". It is an ASCII LF 
character. The CR and the LF together are the physical representation 
(in a Windows text file) of the logical "newline" concept.

Internally, LF is used (irrespective of platform) to represent that concept.

> Do you have to handle it yourself? 

No.

> Or will
> Python cleverly suppress it so you don't have to worry about it?

Suppressed: no, it's a transformation from a physical line termination 
representation to a logical one. Cleverly: matter of opinion. By Python: 
In general, no -- the transformation is handled by the underlying C 
run-time library.

> 
> And when you write a text file under Python using "w" mode, will the
> people who come along afterwards to edit the file in Notepad curse your
> name?

If they do, it will not be because other than CRLF has been written as a 
line terminator.

> Notepad expects \r\n EOL characters, and gets cranky if the \r is
> missing.

AFAIR, it performs well enough for a text editor presented with a file 
consisting of one long unterminated line with occasional embedded 
meaningless-to-the-editor control characters. You can scroll it, edit 
it, write it out again ... any crankiness is likely to be between the 
keyboard and the chair :-)

> 
> How does this behaviour differ from "universal newlines"?
> 

Ordinary behaviour in text mode:

Win: \r\n -> newline -> \r\n
Mac OS X < 10: \r -> newline -> \r
other box: \n -> newline -> \n

Note : TFM does appear a little light on in this area. I suppose not all 
users of Python have aquired this knowledge by osmosis through decades 
of usage of C on multiple platforms :-)

"Universal newlines":
On *any* box: \r\n or \n or \r (even a mixture) -> \n on reading
On writing, behaviour is "ordinary" i.e. the line terminator is what is 
expected by the current platform

"Universal newlines" (if used) solves problems like where an other-boxer 
FTPs a Windows text file in binary mode and then posts laments about all 
those ^M things on the vi screen and :1,$s/^M//g doesn't work :-)

HTH,
John



More information about the Python-list mailing list