Python MSI not installing, log file showing name of a Viatnemese communist revolutionary

Sat Mar 22 21:07:32 EDT 2014

On Sun, 23 Mar 2014 02:09:20 +1100, Chris Angelico wrote:

> On Sun, Mar 23, 2014 at 1:50 AM, Steven D'Aprano
> <steve+comp.lang.python at pearwood.info> wrote:
>> Line endings are terminators: they end the line. Whether you consider
>> the terminator part of the line or not is a matter of opinion (is the
>> cover of a book part of the book?) but consider this:
>>
>>     If you say that the end of lines are *not* part of the line, then
>>     that implies that some parts of the file are not inside any line at
>>     all. And that would be just weird.
> 
> Not so weird IMO. A file is not a concatenation of lines; it is a stream
> of bytes. 

But a *text file* is a concatenation of lines. The "text file" model is 
important enough that nearly all programming languages offer a line-based 
interface to files, and some (Python at least, possibly others) make it 
the default interface so that iterating over the file gives you lines 
rather than bytes -- even in "binary" mode.

> Now, if you ask Python to read you 512 bytes from a binary
> file, and then ask for another 512 bytes, and so on until you reach the
> end, then it would indeed be VERY weird if there were parts of the file
> that weren't in the returned (byte) strings. But if you ask for a line,
> and then another line, and another line, then it's quite reasonable to
> interpret U+000A as "line separation" rather than "line termination",
> and not return it. (Both interpretations make sense. I just wish the
> most obvious form of iteration gave the cleaner/tidier version, or at
> very least that there be some really obvious way to ask for
> lines-without-endings.)

There is: call strip('\n') on the line after reading it. Perl and Ruby 
spell it chomp(). Other languages may spell it differently. I don't know 
of any language that automatically strips newlines, probably because you 
can easily strip the newline from the line, but if the language did it 
for you, you cannot reliably reverse it.

> Imagine the output of GNU find as a series of
> records. You can ask for those to be separated by newlines (the default,
> or -print), or by NULs (with the -print0 command). In either case, the
> records do not *contain* that value, they're separated by it; the
> records consist of file names.

I have no problem with that: when interpreting text as a record with 
delimiters, e.g. from a CSV file, you normally exclude the delimiter. 
Sometimes the line terminator does double-duty as a record delimiter as 
well.

Reading from a file is considered a low-level operation. Reading 
individual bytes in binary mode is the lowest level; reading lines in 
text mode is the next level, built on top of the lower binary mode. You 
build higher protocols on top of one or the other of that mode, e.g. 
"read a zip file" would be built on top of binary mode, "read a csv file" 
would be built on top of text mode.

As a low-level protocol, you ought to be able to copy a file without 
changing it by reading it in then writing it out:

for blob in infile:
    outfile.write(blob)

ought to work whether you are in text mode or binary mode, so long as the 
infile and outfile are opened in the same mode. If Python were to strip 
newlines, that would no longer be the case.

(Even high-level protocols should avoid unnecessary modifications to 
files. One of the more annoying, if not crippling, limitations to the 
configparser module is that reading an INI file in, then writing it out 
again destroys the high-level structure of the file: comments and blank 
lines are stripped, and records may be re-ordered.)

-- 
Steven D'Aprano
http://import-that.dreamwidth.org/