OT: Programmers whos first language is not English

Mon Mar 10 12:57:04 EST 2003

On Mon, 10 Mar 2003 16:34:55 +0100, "Anders J. Munch"
<andersjm at dancontrol.dk> wrote:

>"Stephen Horne" <intentionally at blank.co.uk> wrote:
>> 
>> Of course XML isn't a friendly language either, but the programmer
>> looking at his editor is not supposed to see XML.
>
>In that case the storage format is an implementation detail.  Whether
>you represent a token sequence in one structured text format (XML) 
>or another (the kind that the tokenize module recognises) shouldn't
>matter much.  Except that the second form is compatible with what
>everybody else uses.

No. The presentation is an implementation detail. The 'tokenised' form
is not tokenised for compression or easy handling by an interpreter -
it is marked up to represent certain types of semantic information.

The differences are...

(1)  The XML 'tokenised' file is where the standard is defined - not
     the plaintext.

(2)  The presentation of the XML file in the editor is not rigidly
     controlled - it may be user configurable, and may vary from one
     editor to another or even from one mode to another.

(3)  The XML 'tokenised' file will contain semantically important
     information which will be lost if the tokenised file is simply
     stripped of markup. The presentation of this semantic information
     is where most of the editors freedom lies.

The whole point of using marked up text is to be able to express
semantic ideas using methods that aren't available in plaintext.

In the beginning, this will be small stuff. The user types a word and
the editor highlights it as an identifier rather than a keyword. But
instead of simply displaying the word in a different colour, it also
uses a different markup when saving. Thus, when you load it into the
Thingy-3000 version which has 20,000 new keywords, it knows that the
words which happen to be spelled the same as these new keywords in
your original file actually happen to be identifiers.

Load it into an editor which knows the latest keywords and it can tell
you that you have identifiers spelled the same as keywords and offer
to change them to an alternative, or to remove that keyword from its
dictionary while editing that file. Or maybe it can use a prefix
symbol to tell keywords and identifiers apart. It can even have
hotkeys to assert 'this is an identifier' or 'this is a keyword' in
exceptional cases. It might even keep an identifier dictionary with
multiple spellings and versioning (so that a library with a new
indentifier to avoid conflict with a new keyword can also still be
used by a caller which only knows the original identifier, for
instance).

Instantly, the risk of new keywords breaking old programs just goes
away.

Next - why should we have to escape special characters used within
string literals. If the string literal was just a markup, with
presentation left to the editor, the need once again simply goes away.
Your editor might display string literals with a different background
colour, and you might use a hotkey to toggle string-literal entry
instead of typing the quote character.

Next - how about multi-line strings.

One thing I hate is something that looks like...

print """this is line 1
this is line 2
this is line 3
"""

Wouldn't it be better if the editor presented such strings neatly
vertically aligned? Something like...

print this is line 1
      this is line 2
      this is line 3

...but, of course, with the string literal (not the padding space to
the left) highlighted with a different background colour.

Next - why do I have to put up with some peoples code which seems to
have an indent width of half-a-mile per level? Why can't I just set an
option that displays the code with an indent level I'm comfortable
with?

These are a just a few of the more obvious issues that can be helped
by defining the standard in terms of marked-up text instead of plain
text.