PEP8 and 4 spaces
Steven D'Aprano
steve at pearwood.info
Tue Jul 8 04:48:08 EDT 2014
On Tue, 08 Jul 2014 11:22:25 +1000, Ben Finney wrote:
> A group of (a particular amount of) U+0020 characters is visually
> indistinguishable from a U+0009 character, when the default semantics
> are applied to each.
Hmmm. I'm not sure there actually *is* such a thing as "default
semantics" for tabs. If you look at a tab character in a font, it
probably looks like a single space, but that depends on the font
designer. But if you look at it in a text editor, it will probably look
like eight spaces, unless it looks like four, or some other number, and
if you look at it in a word processor, it will probably look like a "jump
to the next tab stop" command. In a spreadsheet application, it will be a
cell separator and consequently doesn't look like anything at all. I
don't think any of those things count as "default semantics".
The point being, tabs are *control characters*, like newlines and
carriage returns and form feeds, not regular characters like spaces and
"A" or "λ". Since "indent" is an *instruction* rather than a character,
it is best handled with a control character.
In any case, if we limit ourselves to text editors, only a specific
number of spaces will be visually indistinguishable from a tab, where the
number depends on which column you start with:
x x # Tab
x x # Seven spaces
x x # Six spaces
x x # Eight spaces
Even in a proportional font, the last two should be distinguishable from
the first two. Admittedly, that does leave the case where N spaces (for
some 1 <= N <= 8) looks like a tab. That's a probably, but it's not the
only one:
* End of line is a problem. I know of *at least* the following seven
conventions for end-of-line:
- ASCII line feed, \n (Unix etc.)
- ASCII carriage return, \r (Acorn, ZX Spectrum, Apple, etc.)
- ASCII \r\n (CP/M, DOS, Windows, Symbian, Palm, etc.)
- ASCII \n\r (RISC OS)
- ASCII Record Separator, \x1E (QNX)
- EBCDIC New Line, \N{NEXT LINE} in Unicode (IBM mainframes)
- ATASCII \x9B (Atari)
* Form feeds are a problem, since they are invisible, but still get used
(by Vim or Emacs, I forget which) to mark sections of text.
* Issues to do with word-wrapping and hyphenation, or lack thereof, are a
problem.
* Encoding issues are a problem.
* There are other invisible characters than spaces (non-breaking space,
em-space, en-space, thin space).
The solution is to use a smarter editor. For example, an editor might
draw a horizontal rule to show a form feed on a line of its own, or
highlight unexpected carriage return characters with ^M, or display tabs
in a different colour from spaces, or overlay it with a \x09 glyph. Or an
editor might be smart enough to automatically do what the current
paragraph or block does: if the block is already indented with tabs,
pressing tab inserts a tab, but if it is indented with spaces, pressing
tab inserts spaces.
Isn't this why you recommend people use a programmer's editor rather than
Notepad? A good editor should handle these things for you automatically,
or at least with a minimum amount of manual effort.
>> The former is a "control" character, which has specific semantics
>> associated with it; the latter is a "printable" character, which is
>> usually printed and interpreted as itself (although in this particular
>> case, the printed representation is hard to see on most output
>> devices).
>
> And those specific semantics make the display of those characters easily
> confused. That is why it's generally a bad idea to use U+0009 in text
> edited by humans.
I disagree. Using tabs is no more a bad idea than using a formfeed, or
having support for multiple encodings.
>> This mailing list doesn't seem to mind that lines beginning with ASCII
>> SPC characters are semantically different from lines beginning with
>> ASCII LF characters, although many detractors of Python seem unduly
>> fixated on it.
>
> The salient difference being that U+000A LINE FEED is easily visually
> distinguished from a short sequence of U+0020 SPACE characters. This
> avoids the confusion, and makes use of both together unproblematic.
True, but that's *only* because your editor chooses to follow the
convention "display a LINE FEED by starting a new line" rather than by
the convention "display the (invisible or zero-width) glyph of the LINE
FEED". If editors were to standardise on the convention "display a
HORIZONTAL TAB character as visibly distinct from a sequence of
spaces" (e.g. by shading the background a different colour, or overlying
it with an arrow) then we would not be having this discussion.
In other words, it is the choice of editors to be *insufficiently smart*
about tabs that causes the problem. There is a vicious circle here:
* editors don't handle tabs correctly
* which leads to (some) people believing that "tabs are bad" and should
be avoided
* which leads to editors failing to handle tabs correctly, because "tabs
are bad" and should be avoided.
A pity really.
--
Steven
More information about the Python-list
mailing list