Python syntax in Lisp and Scheme

prunesquallor at comcast.net prunesquallor at comcast.net
Tue Oct 7 19:26:35 EDT 2003


Alexander Schmolck <a.schmolck at gmx.net> writes:

> Joe Marshall <jrm at ccs.neu.edu> writes:
>
>
>> Alexander Schmolck <a.schmolck at gmx.net> writes:
>> 
>> > prunesquallor at comcast.net writes:
>> (I'm ignoring the followup-to because I don't read comp.lang.python)
>
> Well, I supposed this thread has spiralled out of control already anyway:)
>  
>> Indentation-based grouping introduces a context-sensitive element into
>> the grammar at a very fundamental level.  Although conceptually a
>> block is indented relative to the containing block, the reality of the
>> situation is that the lines in the file are indented relative to the
>> left margin.  So every line in a block doesn't encode just its depth
>> relative to the immediately surrounding context, but its absolute
>> depth relative to the global context.  
>
> I really don't understand why this is a problem, since its trivial to
> transform python's 'globally context' dependent indentation block structure
> markup into into C/Pascal-style delimiter pair block structure markup.

Of course it can.  Any unambiguous grammar has a parse tree.

> Significantly, AFAICT you can easily do this unambiguously and *locally*, for
> example your editor can trivially perform this operation on cutting a piece of
> python code and its inverse on pasting (so that you only cut-and-paste the
> 'local' indentation). Prima facie I don't see how you loose any fine control.

Only if your cut boundaries are at the same lexical level.  If you cut
across boundaries, it is no longer clear what should happen at the paste.

Also, it is frequently the case that you need to `tweak' the code after
you paste it.

>> Additionally, each line encodes this information independently of the other
>> lines that logically belong with it, and we all know that when some data is
>> encoded in one place may be wrong, but it is never inconsistent.
>
> Sorry, I don't understand this sentence, but maybe you mean that the potential
> inconsitency between human and machine interpretation is a *feature* for Lisp,
> C, Pascal etc!? If so I'm really puzzled.

You misunderstand me.  In a python block, two expressions are
associated with each other if they are the same distance from the left
edge.  This is isomorphic to having a nametag identifying the scope
of the line.  Lines are associated with each other iff they have the
same nametag.  Change one, and all must change.

If, instead, you use balanced delimiters, then a subexpression no
longer has to encode its position within the containing expression.

Let me demonstrate the isomorphism.  A simple python expression:
(grrr..   I cut and paste it, but it lost its indentation between
the PDF file and Emacs.  I hope I redo it right...)

def index(directory):
    # like os.listdir, but traverses directory trees
    stack = [directory]
    files = []
    while stack:
        directory = stack.pop()
        for file in os.listdir(directory):
            fullname = os.path.join(directory, file)
            files.append(fullname)
            if os.path.isdir(fullname) and not os.path.islink(fullname):
                stack.append(fullname)
    return files

Now the reason we know that `            files.append(fullname)' and
`            fullname = os.path.join(directory, file)' are part of the 
same block is because they both begin with 12 spaces.  The first
four spaces encode the fact that they belong to the same function,
the next four indicate that they belong in the while loop, and
the final four indicate that they belong in the for loop.
The `    return files', on the other hand, only has four spaces, so
it cannot be part of the while or for loop, but it is still part
of the function.  I can represent this same information as a code:

t   -def index(directory):
d   -    # like os.listdir, but traverses directory trees
d   -    stack = [directory]
d   -    files = []
d   -    while stack:
dw  -        directory = stack.pop()
dw  -        for file in os.listdir(directory):
dwf -            fullname = os.path.join(directory, file)
dwf -            files.append(fullname)
dwf -            if os.path.isdir(fullname) and not os.path.islink(fullname):
dwfi-                stack.append(fullname)
d   -    return files

The letter in front indicates what lexical group the line belongs to.  This
is simply a different visual format for the leading spaces.

Now, suppose that I wish to protect the body of the while statement
within a conditional.  Simply adding the conditional won't work:

d   -    while stack:
dw  -        if copacetic():
dw  -        directory = stack.pop()
dw  -        for file in os.listdir(directory):
dwf -            fullname = os.path.join(directory, file)
dwf -            files.append(fullname)
dwf -            if os.path.isdir(fullname) and not os.path.islink(fullname):
dwfi-                stack.append(fullname)

because the grouping information is replicated on each line, I have to
fix this information in the six different places it is encoded:

d    -    while stack:
dw   -        if copacetic():
dwi  -            directory = stack.pop()
dwi  -            for file in os.listdir(directory):
dwif -                fullname = os.path.join(directory, file)
dwif -                files.append(fullname)
dwif -                if os.path.isdir(fullname) and not os.path.islink(fullname):
dwifi-                    stack.append(fullname)
 
The fact that the information is replicated, and that there is nothing
but programmer discipline keeping it consistent is a source of errors.

>> There is yet one more problem. The various levels of indentation encode
>> different things: the first level might indicate that it is part of a
>> function definition, the second that it is part of a FOR loop, etc. So on
>> any line, the leading whitespace may indicate all sorts of context-relevant
>> information. 
>
> I don't understand why this is any different to e.g. ')))))' in Lisp. The
> closing ')' for DEFUN just looks the same as that for IF.

That is because the parenthesis *only* encode the grouping information,
they do not do double duty and encode what they are grouping.  The key
here is to realize that the words `DEFUN' and the `IF' themselves look 
very different.

>> Yet the visual representation is not only identical between all of these, it
>> cannot even be displayed.
>
> I don't understand what you mean. Could you maybe give a concrete example of
> the information that can't be displayed? 

Sure.  Here are five parens )))))  How much whitespace is there here:          

>
> Still, I'm sure you're familiar with the following quote (with which I most
> heartily agree):
>
>  "[P]rograms must be written for people to read, and only incidentally for
>    machines to execute."
>
> People can't "read" '))))))))'.

Funny, the people you just quoted would disagree with you about parenthesis.
I expect that they would disagree with you about whitespace as well.




More information about the Python-list mailing list