[Python-Dev] [Python-3000] Universal newlines support in Python 3.0

Barry Warsaw barry at python.org
Tue Aug 14 15:58:32 CEST 2007


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Aug 13, 2007, at 4:15 PM, Guido van Rossum wrote:

> I've seen similar behavior in MS VC++ (long ago, dunno what it does
> these days). It would read files with \r\n and \n line endings, and
> whenever you edited a line, that line also got a \r\n ending. But
> unchanged lines that started out with \n-only endings would keep the
> \n only. And there was no way for the end user to see or control this.
>
> To emulate this behavior in Python you'd have to read the file in
> binary mode *or* we'd have to have an additional flag specifying to
> return line endings as encountered in the file. The newlines attribute
> (as defined in 2.x) doesn't help, because it doesn't tell which lines
> used which line ending. I think the newline feature in PEP 3116 falls
> short too; it seems mostly there to override the line ending *written*
> (from the default os.sep).
>
> I think we may need different flags for input and for output.
>
> For input, we'd need two things: (a) which are acceptable line
> endings; (b) whether to translate acceptable line endings to \n or
> not. For output, we need two things again: (c) whether to translate
> line endings at all; (d) which line endings to translate. I guess we
> could map (c) to (b) and (d) to (a) for a signature that's the same
> for input and output (and makes sense for read+write files as well).
> The default would be (a)=={'\n', '\r\n', '\r'} and (b)==True.

I haven't thought about the output side of the equation, but I've  
already hit a situation where I'd like to see the input side (b)  
option implemented.

I'm still sussing out the email package changes (down to 7F/9E of 247  
tests!) but in trying to fix things I found myself wanting to open  
files in text mode so that I got strings out of the file instead of  
bytes.  This was all fine except that some of the tests started  
failing because of the EOL translation that happens unconditionally  
now.   The file contained \r\n and the test was ensuring these EOLs  
were preserved in the parsed text.  I switched back to opening the  
file in binary mode, and doing a crufty conversion of bytes to  
strings (which I suspect is error prone but gets me farther along).

It would have been perfect, I think, if I could have opened the file  
in text mode so that read() gave me strings, with universal newlines  
and preservation of line endings (i.e. no translation to \n).

- -Barry

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.7 (Darwin)

iQCVAwUBRsG1CXEjvBPtnXfVAQKF3AP/X+/E44KI2EB3w0i3N5cGBCajJbMV93fk
j2S/lfQf4tjBH3ZFEhUnybcJxsNukYY65T4MdzKh+IgJHV5s0rQtl2Hzr85e7Y0O
i5Z3N4TAKc11PjSIk6vKrkgwPCEMzvwIQ5DFxeQBF5kOF6cZuXKaeDzB6z/GBYNv
YiJEnOeZkW8=
=u6OL
-----END PGP SIGNATURE-----


More information about the Python-Dev mailing list