[Python-ideas] Iterating non-newline-separated files should be easier

Andrew Barnert abarnert at yahoo.com
Fri Jul 18 06:23:05 CEST 2014


On Jul 17, 2014, at 20:36, Chris Angelico <rosuav at gmail.com> wrote:

> On Fri, Jul 18, 2014 at 1:21 PM, Steven D'Aprano <steve at pearwood.info> wrote:
>> You seem to be talking about the implementation of the change, but what
>> is the interface? Having made all these changes, how does it effect
>> Python code? You have a use-case of splitting on something other than
>> the standard newlines, so how does one do that? E.g. suppose I have a
>> file "spam.txt" which uses NEL (Next Line, U+0085) as the end of line
>> character. How would I iterate over lines in this file?
> 
> The way I understand it is this:
> 
> for line in open("spam.txt", newline="\u0085"):
>    process(line)
> 
> If that's the case, I would be strongly in favour of this. Nice and
> clean, and should break nothing; there'll be special cases for
> newline=None and newline='', and the only change is that, instead of a
> small number of permitted values ('\n', '\r', '\r\n'), any string (or
> maybe any one-character string plus '\r\n'?) would be permitted.
> 
> Effectively, it's not "iterate over this file, divided by \0 instead
> of newlines", but it's "this file uses the unusual encoding of
> newline=\0, now iterate over lines in the file". Seems a smart way to
> do it IMO.

Exactly. As soon as Alexander suggested it, I immediately knew it was much better than my original idea.

(Apologies for overestimating the obviousness of that.)




More information about the Python-ideas mailing list