Atoms, Identifiers, and Primaries

rusi rustompmody at gmail.com
Thu Apr 18 13:04:43 EDT 2013


On Apr 17, 11:43 pm, Steven D'Aprano <steve
+comp.lang.pyt... at pearwood.info> wrote:
>
> You won't gain that from the *grammar* of the language. Grammar is only part
> of the story, and in some ways, the least important part. If I tell you
> that the grammar of English includes:
>
> ADJECTIVE NOUN
>
> that alone is not going to help you understand the differences between
> a "wise man" and a "wise guy", or "peanut oil" and "baby oil".

Heh!! Cute example. [I'll remember it when I am teaching]


> > My question:  what did the interpreter
> > have to do to evaluate the expression x and y and return a value of
> > zero? I know the lexical analyzer has to parse the stream of characters
> > into tokens.  I presume this parsing generates the toxens x, y, and, and
> > a NEWLINE.
>
> Well, yes, but you're being awfully reductionist here. I'm the first to be
> in favour of curiosity for curiosity's sake, but I'm not sure that getting
> bogged down at such a low level this early in your Python learning
> experience is a good idea. *shrug* No skin off my nose though.
>

Good to be reductionist sometime (and stop being reductionist rest of
the time)

That is to say good to know the general lay of the land for what
happens inside a language implementation.
Broadly speaking it goes like this:
1. Lexical analysis -- separating into tokens/lexemes, removing
comments, (special for python, making sense of the indentation
structure)
2. Syntax analysis -- building the parse tree (at least in principle)
for the program that accords with the grammar
Convert the (concrete) parse tree into an abstract syntax tree (AST)
3. Semantic analysis (Type-checking): Not much of typechecking in
python just things like checking Name error
In a more usual (statically typed) language like C/java etc the AST
gets 'decorated' with type information
Once you are here, the undesirable cases have been weeded out and the
program (if correct) has been annotated well enough (decorated AST)
for…
4. Code generation/Interpretation using a straightforward recursive
walk down the decorated AST
5. An optimizing compiler may do more with the output of 4 (also
between 3 and 4)

Languages like C put the above 1-5 into a box called 'compiler-proper'
and stick a preprocessor before and assembler and linker after.
So while it is good to ask about the lexer, it is also the most boring
and irrelevant part of the system (to paraphrase Steven)



More information about the Python-list mailing list