OT: Programmers whos first language is not English

Fri Mar 14 12:32:52 EST 2003

On Fri, 14 Mar 2003 13:52:44 GMT, Alex Martelli <aleax at aleax.it>
wrote:

>Alexander Schmolck wrote:
>
>> Languages have semantics, XML is just syntax.
>
>You don't seem to have any clue at all about the meaning of
>the word "language" in computer science.  Get some such clue:
>it will help you avoid getting egg all over your face (in a
>metaphorical sense) whenever you discuss this.  Any elementary
>course in theoretical computer science will, very early on,
>DEFINE "a language" as "a set of strings". "Semantics", i.e.
>any correspondence between strings that are elements of the
>language and any other kind of entities, is outside the issue.

Harsh, but true.

Actually, the definition is not originally from computer science -
rather it's from analytical linguistics (is that the term?) and
particularly from Chomskys work.

For the record...

(1)  As Alex said, a language is a set of strings. It need not even
have syntax in principle.

(2)  Chomsky defined a classification of languages in terms of how the
set of strings can be recognised as being inside or outside of a
language. This recognition depends on 'rewrite rules'. If a scentence
is known to be within the language, then another scentence derived
from the first using a valid rewrite rule for the language is also by
definition included in the language. With the right rewrite rules,
repeated application of this can in principle 'generate' an infinite
set of strings.

(3)  The rewrite rules effectively describe syntax for most classes of
language.

(4)  The classifications are...

Level 0 : Completely unrestricted, IIRC
Level 1 : Languages defined by context sensitive grammars
Level 2 : Languages defined by context free grammars
Level 3 : Languages defined by regular grammars 
Level 4 : Languages defined by finite choice grammars 

Level 4 was not defined by Chomsky, but turns up from time to time
because it is still a useful concept. The meaning should be obvious -
the language is defined by a finite list of the possible strings.

In level 3, the only recursion permitted is equivalent to repetition.
Regular grammars are typically described using <insert drum roll>
regular expressions.

In level 2, all recursion is permitted. Nesting is therefore possible.
Most parsing is based on context free grammar theory - though mainly
for convenience as the context has to be reintroduced in the form of
various semantic error checks.

In level 1, the validity of one part of a string may be affected by
context from another part of the string.

In level 0, there is no useful structure to exploit. IIRC, while we
can prove the existence of level 0 languages it is in general
impossible to define one because you can neither list all the strings
nor define logical relationships between them. There may be exceptions
(using a definition methods other than rewrite rules) but I'm not sure
- I seem to remember being told that context sensitive grammars are
the most general that its possible to fully define.