Python syntax in Lisp and Scheme

Mon Oct 6 04:09:24 EDT 2003

[cc'ed since I wasn't sure if you would be tracking the c.l.py thread]

Russell Wallace:
> A program is a stream of tokens, which may be separated by whitespace.
> The sequence { (zero or more statements) } is a statement.

Some C tokens may be separated by whitespace and some *must* be
separated by whitespace.

static const int i
static const inti

i + + + 1
i ++ + 1

The last case is ambiguous, so the tokenizer has some logic to
handle that -- specifically, a greedy match with no backtracking.
It throws away the ignorable whitespace and gives a stream of
tokens to the parser.

> What's the equivalent for Python?

One definition is that "a program is a stream of tokens, some
of which may be separated by whitespace and some which
must be separated by whitespace."   Ie,  the same as my
reinterpretation of your C definition.

For a real answer, start with

http://python.org/doc/current/ref/line-structure.html
"A Python program is divided into a number of logical lines."

http://python.org/doc/current/ref/logical.html
"The end of a logical line is represented by the token NEWLINE.
Statements cannot cross logical line boundaries except where
NEWLINE is allowed by the syntax (e.g., between statements in
compound statements). A logical line is constructed from one or
more physical lines by following the explicit or implicit line joining
rules."

http://python.org/doc/current/ref/physical.html
"A physical line ends in whatever the current platform's convention
is for terminating lines. On Unix, this is the ASCII LF (linefeed)
character. On Windows, it is the ASCII sequence CR LF (return
followed by linefeed). On Macintosh, it is the ASCII CR (return)
character."

and so on.

> Except that 'if', 'while' etc lines are terminated with delimiters
> rather than newline. Oh, and doesn't Python have the option to use \
> or somesuch to continue a regular line?

The C tokenizer turns the delimiter character into a token.

The Python tokenizer turns indentation level changes into
INDENT and DEDENT tokens.  Thus, the Python parser just
gets a stream of tokens.  I don't see a deep difference here.

Both tokenizers need to know enough about the respective
language to generate the appropriate tokens.

> But in ways that are objectively less severe because:
>
> - If the indentation is buggered up, the brackets provide the
> information you need to figure out what the indentation should have
> been.

As I pointed out, one of the pitfalls which does occur in C
is the dangling else

if (a)
  if (b)
    c++;
else    /* indented incorrectly but valid */
  c--

That mistake does not occur in Python.  I personally had
C++ code with a mistake based on indentation.  I and
three other people spent perhaps 10-15 hours spread
over a year to track it down.  We all knew where the bug
was supposed to be in the code, but the indentation threw
us off.

> - The whole tabs vs spaces issue doesn't arise.

That's an issue these days?  It's well resolved -- don't
use tabs.

And you know, I can't recall a case where it's every
been a serious problem for me.  I have a couple of times
had a problem, but never such that the code actually
worked, unlike the if/else code I listed above for C.

                    Andrew
                    dalke at dalkescientific.com