when does newlines get set in universal newlines mode?

Peter Otten __peter__ at web.de
Mon May 4 11:17:21 EDT 2015


Chris Angelico wrote:

> On Mon, May 4, 2015 at 10:01 PM, Peter Otten <__peter__ at web.de> wrote:
>> I tried:
>>
>>>>> with open("tmp.txt", "wb") as f: f.write("alpha\r\nbeta\rgamma\n")
>> ...
>>>>> f = open("tmp.txt", "rU")
>>>>> f.newlines
>>>>> f.readline()
>> 'alpha\n'
>>>>> f.newlines
>> # expected: '\r\n'
>>>>> f.readline()
>> 'beta\n'
>>>>> f.newlines
>> '\r\n' # expected: ('\r', '\r\n')
>>>>> f.readline()
>> 'gamma\n'
>>>>> f.newlines
>> ('\r', '\n', '\r\n')
>>
>> I believe this is a bug.
> 
> I'm not sure it is, actually; imagine the text is coming in one
> character at a time (eg from a pipe), and it's seen "alpha\r". It
> knows that this is a line, so it emits it; but until the next
> character is read, it can't know whether it's going to be \r or \r\n.
> What should it do? Read another character, which might block? Put "\r"
> into .newlines, which might be wrong? Once it sees the \n, it knows
> that it was \r\n (or rather, it assumes that files do not have lines
> of text terminated by \r followed by blank lines terminated by \n -
> because that would be stupid).
> 
> It may be worth documenting this limitation, but it's not something
> that can easily be fixed without removing support for \r newlines -
> although that might be an option, given that non-OSX Macs are
> basically history now.

OK, you convinced me. Then I tried:

>>> with open("tmp.txt", "wb") as f: f.write("0\r\n3\r5\n7")
... 
>>> assert len(open("tmp.txt", "rb").read()) == 8
>>> f = open("tmp.txt", "rU")
>>> f.readline()
'0\n'
>>> f.newlines
>>> f.tell()
3
>>> f.newlines
'\r\n'

Hm, so tell() moves the file pointer? Is that sane?




More information about the Python-list mailing list