OT: Programmers whos first language is not English

Wed Mar 12 06:19:06 EST 2003

On Wed, 12 Mar 2003 16:48:42 +1300, Paul Foley <see at below.invalid>
wrote:

>Oh.  You just want to represent the tokens?  In that case, you
>probably want something more like
>
>  (keyword "def") (id "test") open-paren (id "x") close-paren indent
>  (keyword "print") (string "x squared is : ") comma (id "x") star
>  (id "x") dedent

No - in the beginning, the main thing I want to represent is tokens -
but not everything need be represented as tokens, and in the long run
there will be other things to include.

Having elements for punctuation such as parentheses and commas seems
extreme and unnecessary. It only becomes necessary if you have no
clear distinction between text and markup.

>than what you wrote; which, indeed, is only slightly better than XML
>(compare
>
>  <keyword>def</keyword><id>test</id><open-paren/><id>x</id>
>  <close-paren/><indent/><keyword>print</keyword>
>  <string>x squared is : </string><comma/><id>x</id><star/>
>  <id>x</id><dedent/>

This isn't as bad as you make out. The main thing I think you're
choking on is the need for the full element identifier in each end
marker. I think the following handles most of your problems in that
respect at least...

>  <keyword name="def"/><id name="test"/><open-paren/><id name="x"/>
>  <close-paren/><indent/><keyword name="print"/>
>  <string>x squared is : </string><comma/><id name="x"/><star/>
>  <id name="x"/><dedent/>

BTW - Replacing an indent 'block structure' with an indent and dedent
pair seems somewhat against the spirit of either version.

>and tell me that's easier to read!)  But what's the point?  The
>tokenizer generates that anyway; you're just replacing a fairly
>trivial tokenizer with gigabytes of XML-processing crap, most of
>which you don't even want to use.

Which is why I don't need gigabytes of XML-processing crap. Ever hear
of a non-validating parser? (which still gets me the character set
translation and well formedness checks as well as parsing).

As for the parsing being trivial - it's not exactly rocket science in
either case, though I admit that LISP-style expressions barely need
more than tokenising and parenthesis matching. But even trivial
scanners and parsers are, well, sufficiently non-trivial that writing
one when theres one already written and available to use seems a bit
masochistic. Once you decide to use code generators that argument goes
away, but since XML is well within the capabilities of a common LALR
parser (except for checking element names match), so does most of the
extra complexity of handling XML.

BTW - the well-formedness rules of XML, and the fact that the parsers
check them are pretty useful for checking that your XML generation is
working right - and indicating where it may be going wrong.
Parenthesis matching (especially with all parenthesised expressions
using the same pair of symbols) is a much weaker check.

>[You don't work for Microsoft, do you?  This sounds like the sort of
>thing they'd do :-)]

There's no need to get nasty ;-)