[Python-Dev] [Python-3000] Universal newlines support in Python 3.0

Guido van Rossum guido at python.org
Mon Aug 13 22:15:03 CEST 2007


On 8/13/07, Russell E Owen <rowen at cesmail.net> wrote:
> In article <87wsw3p5em.fsf at uwakimon.sk.tsukuba.ac.jp>,
>  "Stephen J. Turnbull" <stephen at xemacs.org> wrote:
>
> > Guido van Rossum writes:
> >
> >  > However, the old universal newlines feature also set an attibute named
> >  > 'newlines' on the file object to a tuple of up to three elements
> >  > giving the actual line endings that were observed on the file so far
> >  > (\r, \n, or \r\n). This feature is not in PEP 3116, and it is not
> >  > implemented. I'm tempted to kill it. Does anyone have a use case for
> >  > this?
> >
> > I have run into files that intentionally have more than one newline
> > convention used (mbox and Babyl mail folders, with messages received
> > from various platforms).  However, most of the time multiple newline
> > conventions is a sign that the file is either corrupt or isn't text.
> > If so, then saving the file may corrupt it.  The newlines attribute
> > could be used to check for this condition.
>
> There is at least one Mac source code editor (SubEthaEdit) that is all
> too happy to add one kind of newline to a file that started out with a
> different line ending character. As a result I have seen a fair number
> of text files with mixed line endings. I don't see as many these days,
> though; perhaps because the current version of SubEthaEdit handles
> things a bit better. So perhaps it won't matter much for Python 3000.

I've seen similar behavior in MS VC++ (long ago, dunno what it does
these days). It would read files with \r\n and \n line endings, and
whenever you edited a line, that line also got a \r\n ending. But
unchanged lines that started out with \n-only endings would keep the
\n only. And there was no way for the end user to see or control this.

To emulate this behavior in Python you'd have to read the file in
binary mode *or* we'd have to have an additional flag specifying to
return line endings as encountered in the file. The newlines attribute
(as defined in 2.x) doesn't help, because it doesn't tell which lines
used which line ending. I think the newline feature in PEP 3116 falls
short too; it seems mostly there to override the line ending *written*
(from the default os.sep).

I think we may need different flags for input and for output.

For input, we'd need two things: (a) which are acceptable line
endings; (b) whether to translate acceptable line endings to \n or
not. For output, we need two things again: (c) whether to translate
line endings at all; (d) which line endings to translate. I guess we
could map (c) to (b) and (d) to (a) for a signature that's the same
for input and output (and makes sense for read+write files as well).
The default would be (a)=={'\n', '\r\n', '\r'} and (b)==True.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)


More information about the Python-Dev mailing list