[Python-ideas] Iterating non-newline-separated files should be easier

Nick Coghlan ncoghlan at gmail.com
Sun Jul 20 01:56:18 CEST 2014


On 20 Jul 2014 09:49, "Nick Coghlan" <ncoghlan at gmail.com> wrote:
>
>
> On 20 Jul 2014 09:28, "Andrew Barnert" <abarnert at yahoo.com> wrote:
> >
> > (replies to multiple messages here)
> >
> > On Saturday, July 19, 2014 1:19 AM, Nick Coghlan <ncoghlan at gmail.com>
wrote:
> >
> >
> > >On 19 July 2014 03:32, Chris Angelico <rosuav at gmail.com> wrote:
> > >> On Sat, Jul 19, 2014 at 5:10 PM, Nick Coghlan <ncoghlan at gmail.com>
wrote:
> > >>> I still favour my proposal there to add a separate "readrecords()"
> > >>> method, rather than reusing the line based iteration methods - lines
> > >>> and arbitrary records *aren't* the same thing
> > >>
> > >> But they might well be the same thing. Look at all the Unix commands
> > >> that usually separate output with \n, but can be told to separate
with
> > >> \0 instead. If you're reading from something like that, it should be
> > >> just as easy to split on \n as on \0.
> > >
> > >Python isn't Unix, and Python has never supported \0 as a "line
> > >ending".
> >
> > Well, yeah, but Python is used on Unix, and it's used to write scripts
that interoperate with other Unix command-line tools.
> >
> > For the record, the reason this came up is that someone was trying to
use one of my scripts in a pipeline with find -0, and he had no problem
adapting the Perl scripts he's using to handle -0 output, but no clue how
to do the same with my Python script.
> >
> > In general, it's just as easy to write Unix command-line tools in
Python as in Perl, and that's a good thing—it means I don't have to use
Perl. But as soon as -0 comes into the mix, that's no longer true. And
that's a problem.
>
> I would find adding NULL to the potential newline set significantly less
objectionable than opening it up to arbitrary character sequences.
>
> Adding a single possible newline character is a much simpler change, and
one likely to have far fewer odd consequences. This is especially so if
specifying NULL as the line separator is only permitted for files opened in
binary mode.

Also, the interoperability argument is a good one, as is the analogy with
'\r'. Since this does end up touching the open() builtin and the core IO
abstractions, it will need a PEP.

As far as implementation goes, I suspect a RecordIOWrapper layered IO model
inspired by the approach used for TextIOWrapper may make sense.

Cheers,
Nick.

>
> Cheers,
> Nick.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20140719/c57e5251/attachment.html>


More information about the Python-ideas mailing list