[Python-ideas] Iterating non-newline-separated files should be easier
Andrew Barnert
abarnert at yahoo.com
Sat Jul 26 06:09:41 CEST 2014
On Jul 25, 2014, at 19:13, Akira Li <4kir4.1i at gmail.com> wrote:
> I've added a patch that demonstrates "no translation" for alternative
> newlines behavior http://bugs.python.org/issue1152248#msg224016
Having taken a better look at the line buffering code, I now agree with you that this is necessary; otherwise we'd have to make a much bigger change to the implementation (which I don't think we want).
When I update the draft PEP I'll change that and add a rationale (this also makes the rationale for "no translation for binary files" and for "only readnl is exposed, not writenl" a lot simpler).
I'll also change it in my C patch (which I hope to be able to clean up and upload this weekend).
> Andrew Barnert
> <abarnert at yahoo.com.dmarc.invalid> writes:
>
>> On Thursday, July 24, 2014 2:08 AM, Akira Li
>> <4kir4.1i at gmail.com> wrote:
>>
>>>> Andrew Barnert <abarnert at yahoo.com> writes:
>>>
>>>> On Jul 23, 2014, at 5:13, Akira Li
>>>> <4kir4.1i at gmail.com> wrote:
>>>>> In order to newline="\0" case to work, it should behave
>>
>>>>> similar to
>>>>> newline='' or newline='\n' case instead i.e., no
>>>>> translation should take
>>>>> place, to avoid corrupting embed "\n\r" characters.
>>>>
>>>> The draft PEP discusses this. I think it would be more consistent to
>>>> translate for \0, just like \r and \r\n.
>>>
>>> I read the [draft]. No translation is a better choice here. Otherwise
>>>> (at the very least) it breaks `find -print0` use case.
>>
>> No it doesn't. The only reason it breaks your code is that you add
>> newline='\0' to your stdout wrapper as well as your stdin wrapper. If
>> you just passed '', it would not do anything. And this is exactly
>> parallel with the existing case with, e.g., trying to pass through a
>> classic-Mac file full of '\r'-delimited strings that might contain
>> embedded '\n' characters that you don't want to translate.
>
> I won't repeat it several times but as you've already found out newline='\0'
> for stdout (at the very least) can be useful for line_buffering=True
> behavior.
>
> ...
>>> There is also line_buffering parameter. From the docs:
>>>
>>> If line_buffering is True, flush() is implied when a call to write
>>> contains a newline character.
>>
>> The way this is actually defined seems broken to me; IIRC (I'll check
>> the code later) it flushes on any '\r', and on any translated
>> \n'. So, it's doing the wrong thing with '\r' in most modes, and with
>> \n' in '' mode on non-Unix systems. So my thought was, just leave it
>> broken.
>
> Yes. I've found at least one issue http://bugs.python.org/issue22069
>
>> But now that I think about it, the existing code can only flush
>> excessively, never insufficiently, and that's probably a property
>> worth preserving. So maybe there _is_ a reason to pass newline for
>> output without translation after all. In other words, the parameter
>> may actually conflate _four_ things, not just three...
>>
>> I'll need to think this through (and reread the code) this weekend;
>> thanks for bringing it up.
>
>
> --
> Akira
>
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/
More information about the Python-ideas
mailing list