Reading in cooked mode (was Re: Python MSI not installing, log file showing name of a Viatnemese communist revolutionary)

Cameron Simpson cs at zip.com.au
Sat Mar 22 22:16:35 EDT 2014


On 23Mar2014 12:37, Chris Angelico <rosuav at gmail.com> wrote:
> On Sun, Mar 23, 2014 at 12:07 PM, Steven D'Aprano
> <steve+comp.lang.python at pearwood.info> wrote:
> > On Sun, 23 Mar 2014 02:09:20 +1100, Chris Angelico wrote:
> >> On Sun, Mar 23, 2014 at 1:50 AM, Steven D'Aprano
> >> <steve+comp.lang.python at pearwood.info> wrote:
> >>> Line endings are terminators: they end the line. Whether you consider
> >>> the terminator part of the line or not is a matter of opinion (is the
> >>> cover of a book part of the book?) but consider this:
> >>>
> >>>     If you say that the end of lines are *not* part of the line, then
> >>>     that implies that some parts of the file are not inside any line at
> >>>     all. And that would be just weird.
> >>
> >> Not so weird IMO. A file is not a concatenation of lines; it is a stream
> >> of bytes.
> >
> > But a *text file* is a concatenation of lines. The "text file" model is
> > important enough that nearly all programming languages offer a line-based
> > interface to files, and some (Python at least, possibly others) make it
> > the default interface so that iterating over the file gives you lines
> > rather than bytes -- even in "binary" mode.
> 
> And lines are delimited entities. A text file is a sequence of lines,
> separated by certain characters.
[...snip...]

As far as I'm concerned, a text file is a sequence lines, each of
which is _terminated_ by a newline (or the OS end-of-line flavour).

So I say "terminated by", not "separated by".

Plenty of people use editors that consider end-of-line to be a
separator and not a terminator, leading to supposed text files
lacking trailing newlines (or end-of-line of OS).

I consider this sloppy and error prone.

I like to be able to read a file and if it lacks a final newline
then I have a good clue that the file was incompletely written.
Editors (and other tools) that won't enforce a trailing newline as
omitting an easy way to give a fairly robust indication of completion
at no benefit to the user. (Not to mention the visual annoyance of
"cat file" when there's no trailing newline.)

So I'm happy to write code that errors if a line lacks a trailing
newline, and thus I consider the newline to be an intergral part
of the line.

Having passed that sanity check, for most machine readable text
formats I'm usually happy to use:

  line = line.rstrip()

to get the salient part of the line.

(Of course, lines extended with slosh-extension or the like need
pickier handling.)

Cheers,
-- 
Cameron Simpson <cs at zip.com.au>

If at first you don't succeed, your sky-diving days are over.
        - Paul Blumstein, paulb at harley.tti.com, DoD #36



More information about the Python-list mailing list