[Python-Dev] [python] Re: New lines, carriage returns, and Windows

Guido van Rossum guido at python.org
Mon Oct 1 04:14:59 CEST 2007


On 9/30/07, Greg Ewing <greg.ewing at canterbury.ac.nz> wrote:
> Michael Foord wrote:
> > We stick to using the .NET file I/O and so don't
> > have a problem. The only time it is an issue for us is our tests, where
> > we have string literals in our test code (where new lines are obviously
> > '\n')
>
> If you're going to do that, you really need to be consistent
> about and have IronPython use \r\n internally for line endings
> *everywhere*, including string literals.

I don't know what you mean by "internally". There's lots of portable
code that uses the \n character in string literals (either to generate
line endings or to recognize them). That code can't suddenly be made
invalid. And changing all string literals that say "\n" to secretly
become "\r\n" would be worse than the \r <--> \n swap that some old
Apple tools used to do. (If len("\n") == 2, what would len("\r\n")
be?)

> > It is just slightly ironic that the time Python 'gets it wrong' (for
> > some value of wrong) is when you are using text mode for I/O :-)
>
> I would say IronPython is getting it wrong by using inconsistent
> internal representations of line endings.

Honestly, I find it hard to see much merit in this discussion. A
number of Python libraries, including print() and io.py, use \n to
represent line endings in memory, and translate these to/from
platform-appropriate line endings when reading/writing text files.
OTOH, some other APIs, for example, sockets talking various internet
protocols (from SMTP to HTTP) as well as most (all?) native .NET APIs,
use \r\n to represent line endings. There are any number of ways to
convert between these conversions, including various invocations of
s.replace() and s.splitlines() (the latter does a
universal-newlines-like thing). Applications can take care of this,
and APIs can choose to use either convention for line endings (or
both, in the case of input).

Yes, occasionally users get confused. Too bad. They'll have to learn
about this issue. The issue isn't going away by wishing it to go away;
it is a fundamental difference between Windows and Unix, and neither
is likely to change or disappear. Changing Python to use the Windows
convention internally isn't going to help one bit. Changing Python to
use the platforn's convention is impossible without introducing a new
string escape that would mean \r\n on Windows and \n on Unix; and
given that there are legitimate reasons to sometimes deal with \r\n
explicitly even on Unix (and with just \n even on Windows) we wouldn't
be completely isolated from the issue. Changing APIs to not represent
the line ending as a character (as the Java I/O libraries do) would be
too big a change (and how would we distinguish between readline()
returning an empty line and EOF?) -- and I'm sure the issue still pops
up in plenty of places in Java.

The best solution for IronPython is probably to have the occasional
wrapper around .NET APIs that translates between \r\n and \n on the
boundary between Python and .NET; but one must be able to turn this
off or bypass the wrappers in cases where the data retrieved from one
.NET API is just passed straight on to another .NET API (and the
translation would just cause two redundant copies being made).

Get used to it. End of discussion.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)


More information about the Python-Dev mailing list