Tabs versus Spaces in Source Code

Sun May 14 22:04:40 EDT 2006

Tabs versus Spaces in Source Code

Xah Lee, 2006-05-13

In coding a computer program, there's often the choices of tabs or
spaces for code indentation. There is a large amount of confusion about
which is better. It has become what's known as “religious war” —
a heated fight over trivia. In this essay, i like to explain what is
the situation behind it, and which is proper.

Simply put, tabs is proper, and spaces are improper. Why? This may seem
ridiculously simple given the de facto ball of confusion: the semantics
of tabs is what indenting is about, while, using spaces to align code
is a hack.

Now, tech geekers may object this simple conclusion because they itch
to drivel about different editors and so on. The alleged problem
created by tabs as seen by the industry coders are caused by two
things: (1) tech geeker's sloppiness and lack of critical thinking
which lead them to not understanding the semantic purposes of tab and
space characters. (2) Due to the first reason, they have created and
propagated a massive none-understanding and mis-use, to the degree that
many tools (e.g. vi) does not deal with tabs well and using spaces to
align code has become widely practiced, so that in the end spaces seem
to be actually better by popularity and seeming simplicity.

In short, this is a phenomenon of misunderstanding begetting a snowball
of misunderstanding, such that it created a cultural milieu to embrace
this malpractice and kick what is true or proper. Situations like this
happens a lot in unix. For one non-unix example, is the file name's
suffix known as “extension”, where the code of file's type became
part of the file name. (e.g. “.txt”, “.html”, “.jpg”).
Another well-known example is HTML practices in the industry, where
badly designed tags from corporation's competitive greed, and stupid
coding and misunderstanding by coders and their tools are so
wide-spread such that they force the correct way to the side by the
eventual standardization caused by sheer quantity of inproper but set
practice.

Now, tech geekers may still object, that using tabs requires the
editors to set their positions, and plain files don't carry that
information. This is a good question, and the solution is to advance
the sciences such that your source code in some way embed such
information. This would be progress. However, this is never thought of
because the “unix philosophies” already conditioned people to hack
and be shallow. In this case, many will simply use the character
intended to separate words for the purpose of indentation or alignment,
and spread the practice with militant drivels.

Now, given the already messed up situation of the tabs vs spaces by the
unixers and unix brain-washing of the coders in the industry... Which
should we use today? I do not have a good proposition, other than just
use whichever that works for you but put more critical thinking into
things to prevent mishaps like this.

Tabs vs Spaces can be thought of as parameters vs hard-coded values, or
HTML vs ascii format, or XML/CSS vs HTML 4, or structural vs visual, or
semantic vs format. In these, it is always easy to convert from the
former to the latter, but near impossible from the latter to the
former. And, that is because the former encodes information that is
lost in the latter. If we look at the issue of tabs vs spaces, indeed,
it is easy to convert tabs to spaces in a source code, but more
difficult to convert from spaces to tabs. Because, tabs as indentation
actually contains the semantic information about indentation. With
spaces, this critical information is lost in space.

This issue is intimately related to another issue in source code:
soft-wrapped lines versus physical, hard-wrapped lines by EOL (end of
line character). This issue has far more consequences than tabs vs
spaces, and the unixer's unthinking has made far-reaching damages in
the computing industry. Due to unix's EOL ways of thinking, it has
created languages based on EOL (just about ALL languages except the
Lisp family and Mathematica) and tools based on EOL (cvs, diff, grep,
and basically every tool in unix), thoughts based on EOL (software
value estimation by counting EOL, hard-coded email quoting system by
“>” prefix, and silent line-truncations in many unix tools), such
that any progress or development towards a “algorithmic code unit”
concept or language syntaxes are suppressed. I have not written a full
account on this issue, but i've touched it in this essay: “The Harm
of hard-wrapping Lines”, at
http://xahlee.org/UnixResource_dir/writ/hard-wrap.html
----
This post is archived at:
http://xahlee.org/UnixResource_dir/writ/tabs_vs_spaces.html

   Xah
   xah at xahlee.org
 ∑ http://xahlee.org/