[Python-ideas] Iterating non-newline-separated files should be easier

Chris Angelico rosuav at gmail.com
Fri Jul 18 05:36:17 CEST 2014


On Fri, Jul 18, 2014 at 1:21 PM, Steven D'Aprano <steve at pearwood.info> wrote:
> On Thu, Jul 17, 2014 at 05:04:00PM -0700, Andrew Barnert wrote:
>
>> It turns out to be even simpler than I expected.
>>
>> I reused the "newline" parameter of open and TextIOWrapper.__init__,
>> adding a param of the same name to the constructors for
>> BufferedReader, BufferedWriter, BufferedRWPair, BufferedRandom, and
>> FileIO.
>>
>> For text files, just remove the check for newline being one of the
>> standard values and it all works. For binary files, remove the check
>> for truthy, make open pass each Buffered* constructor newline=(newline
>> if binary else None), make each Buffered* class store it, and change
>> two lines in RawIOBase.readline to use it. And that's it.
>
> All the words are in English, but I have no idea what you're actually
> saying... :-)
>
> You seem to be talking about the implementation of the change, but what
> is the interface? Having made all these changes, how does it effect
> Python code? You have a use-case of splitting on something other than
> the standard newlines, so how does one do that? E.g. suppose I have a
> file "spam.txt" which uses NEL (Next Line, U+0085) as the end of line
> character. How would I iterate over lines in this file?

The way I understand it is this:

for line in open("spam.txt", newline="\u0085"):
    process(line)

If that's the case, I would be strongly in favour of this. Nice and
clean, and should break nothing; there'll be special cases for
newline=None and newline='', and the only change is that, instead of a
small number of permitted values ('\n', '\r', '\r\n'), any string (or
maybe any one-character string plus '\r\n'?) would be permitted.

Effectively, it's not "iterate over this file, divided by \0 instead
of newlines", but it's "this file uses the unusual encoding of
newline=\0, now iterate over lines in the file". Seems a smart way to
do it IMO.

ChrisA


More information about the Python-ideas mailing list