Tabs versus Spaces in Source Code

Xah Lee xah at xahlee.org
Tue May 23 07:02:42 EDT 2006


the following are 2 FAQ following this thread. Thanks.

Addendum: 2006-05-15

Q: What you mean by embeding tab position info into the source code?
How's that gonna be done?

A: Tech geekers may not realize, but such embedding of meta info do
exist in many technologies by various means because of a need. For
example, Mac OS Classic's resource fork and Mac OS X's bundling system,
unix shell script's shebang (#!), emacs and Python's encoding
declaration “#-*- coding: utf-8 -*-”, Unicode's BOM, CVS's
change-log insertion, Mathematica's source code system the Notebook,
Microsoft Word's transparent meta data, as well as HTML and XML's
various declarations embedded in the file. Some of these systems are
good designs and some are hacks.

Somehow tech geekers have the sense that “source code” must be a
plain text file containing nothing else but the programing code. This
may be a defendable position, but as we can see in the above examples,
this idea is primitive and does not address the various needs. If the
tech geekers have thought out about these issues, computing languages
and its source code may have developed into more powerful and flexible
integrated systems as the above standardized examples. For instance,
many commercial development systems actually already have such
meta-data embodied with the source code. (e.g. Borland Delphi,
Metrowerks's CodeWarrior, Microsoft Visual Studio, Wolfram Research's
Mathematica.) Some of which, not only embody development-related info
such as debug points or linking files, but also allow programers to
high-light code for visual purposes like a word processor, or even
display them visually as type-set mathematics.

Q: Converting spaces to tabs is actually easy. I don't see how spacess
lose info.

A: Here is a illustration on how it is not possible to convert spaces
to tabs. Suppose you are writing in a language where the indentation is
part of the semantics, not just for appearance. Now, suppose you have
these two lines:

1234567890
  A
    B

The first line has 2 space prefix and second line has 4 space prefix.
How, if you convert this to tabs, how do you know that's 1 and 2 tabs,
or 2 and 4 tabs? In essence, there is no way to tell how many tabs n
represents, where n is the smallest space prefix in the code, unless n
== 1.

The above demonstrates the information loss in using spaces for
indentation in a theoretical way. There are also practical problems. In
practice, many languages allow string literals like this myName="i love
you", and strings easily can have a run of spaces. One cannot simply
run a blind find-n-replace operation to replace all spaces to tabs. But
also, many unix languages contains a so-called construct of
“heredoc” as a mean to embed a literal block of text. For example,
here's a PHP construct of heredoc:

$novelText = <<<arbitraryCharsHereAsDelimiter
            (__)
            (oo)
     /-------\/
    / |     ||
   *  ||----||
      ~~    ~~
arbitraryCharsHereAsDelimiter;
}

Regardless of its design as a language construct, the purpose of
“heredoc” is that it allows programers to easily embed a text (a
large string), without worrying about the text containing sequence of
characters that may be meaningful to the language. If a language has
heredoc construct, then it is basically impossible to convert from
spaces to tabs, as that will botch literal string embedded in heredoc.
However, it is less of a problem to convert tabs to spaces, because the
frequency of spaces appearing in literal strings are far higher than
literal tabs.

Another practical issue is error recovery. Suppose, one uses 4 spaces
for a indentation. Now, it is not uncommon to see lines with odd number
of space prefixes such as 7 or 10 out of common sloppiness. Such error
would happen more often if spaces are used for indentation, and the
essence is that tabs enforce a semantic association and is impossible
to make a half-indentation.

Q: Well, i just like spaces because they are most compatible.

A: Sure, crass simplicity is always more compatible. Suppose a unixer
will say, he doesn't like HTML because it is fret with problems and
incompatibilities. He'd rather prefer plain text. And, indeed, a lot
unixers seriously think that.

---------------------------
PS in the answer to the first question, i gave the following examples
of IDE/Language that actually embed formatting info in the source code:
Borland Delphi, Metrowerks's CodeWarrior, Microsoft Visual Studio,
Wolfram Research's Mathematica

actually, i know Mathematica does, but i'm not quite sure about the
other examples. So, my question is, does any one knows a language or
IDE that actually allows the coder to manually highlight parts of the
code and this highlight stick with the file upon reopening, as if a
word processor?

   Xah
   xah at xahlee.orghttp://xahlee.org/

Xah Lee wrote:
> Tabs versus Spaces in Source Code
> This post is archived at:
> http://xahlee.org/UnixResource_dir/writ/tabs_vs_spaces.html




More information about the Python-list mailing list