[Python-ideas] Iterating non-newline-separated files should be easier

Sat Jul 26 06:09:41 CEST 2014

On Jul 25, 2014, at 19:13, Akira Li <4kir4.1i at gmail.com> wrote:

> I've added a patch that demonstrates "no translation" for alternative
> newlines behavior http://bugs.python.org/issue1152248#msg224016

Having taken a better look at the line buffering code, I now agree with you that this is necessary; otherwise we'd have to make a much bigger change to the implementation (which I don't think we want).

When I update the draft PEP I'll change that and add a rationale (this also makes the rationale for "no translation for binary files" and for "only readnl is exposed, not writenl" a lot simpler). 

I'll also change it in my C patch (which I hope to be able to clean up and upload this weekend).

> Andrew Barnert
> <abarnert at yahoo.com.dmarc.invalid> writes:
> 
>> On Thursday, July 24, 2014 2:08 AM, Akira Li
>> <4kir4.1i at gmail.com> wrote:
>> 
>>>> Andrew Barnert <abarnert at yahoo.com> writes:
>>> 
>>>> On Jul 23, 2014, at 5:13, Akira Li
>>>> <4kir4.1i at gmail.com> wrote:
>>>>> In order to newline="\0" case to work, it should behave 
>> 
>>>>> similar to
>>>>> newline='' or newline='\n' case instead i.e., no 
>>>>> translation should take
>>>>> place, to avoid corrupting embed "\n\r" characters.
>>>> 
>>>> The draft PEP discusses this. I think it would be more consistent to
>>>> translate for \0, just like \r and \r\n.
>>> 
>>> I read the [draft]. No translation is a better choice here. Otherwise
>>>> (at the very least) it breaks `find -print0` use case.
>> 
>> No it doesn't. The only reason it breaks your code is that you add
>> newline='\0' to your stdout wrapper as well as your stdin wrapper. If
>> you just passed '', it would not do anything. And this is exactly
>> parallel with the existing case with, e.g., trying to pass through a
>> classic-Mac file full of '\r'-delimited strings that might contain
>> embedded '\n' characters that you don't want to translate.
> 
> I won't repeat it several times but as you've already found out newline='\0'
> for stdout (at the very least) can be useful for line_buffering=True
> behavior.
> 
> ...
>>> There is also line_buffering parameter. From the docs:
>>> 
>>>   If line_buffering is True, flush() is implied when a call to write
>>>   contains a newline character.
>> 
>> The way this is actually defined seems broken to me; IIRC (I'll check
>> the code later) it flushes on any '\r', and on any translated
>> \n'. So, it's doing the wrong thing with '\r' in most modes, and with
>> \n' in '' mode on non-Unix systems. So my thought was, just leave it
>> broken.
> 
> Yes. I've found at least one issue http://bugs.python.org/issue22069
> 
>> But now that I think about it, the existing code can only flush
>> excessively, never insufficiently, and that's probably a property
>> worth preserving. So maybe there _is_ a reason to pass newline for
>> output without translation after all. In other words, the parameter
>> may actually conflate _four_ things, not just three...
>> 
>> I'll need to think this through (and reread the code) this weekend;
>> thanks for bringing it up.
> 
> 
> --
> Akira
> 
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/