[Tutor] Fwd: Difference between types
eryksun
eryksun at gmail.com
Sun May 26 06:39:36 CEST 2013
On Fri, May 24, 2013 at 2:53 PM, Albert-Jan Roskam <fomcl at yahoo.com> wrote:
> Why do I need to use a trailing comma to create a singleton
> tuple? Without a comma it seems to mean "parenthesized single
> object", ie the parentheses are basically not there.
Here are some technical notes and references to augment the other
answers. I think this is on topic in this thread. Even though the
title is "Difference between types", I think Citizen Kant is mostly
concerned with tokenizing and parsing source code.
Python 3.3's grammar definition:
http://hg.python.org/cpython/file/3.3/Grammar/Grammar
Here's an incomplete subset:
power: atom trailer* ['**' factor]
factor: ('+'|'-'|'~') factor | power
term: factor (('*'|'/'|'%'|'//') factor)*
arith_expr: term (('+'|'-') term)*
shift_expr: arith_expr (('<<'|'>>') arith_expr)*
and_expr: shift_expr ('&' shift_expr)*
xor_expr: and_expr ('^' and_expr)*
expr: xor_expr ('|' xor_expr)*
star_expr: '*' expr
comp_op:
'<'|'>'|'=='|'>='|'<='|'<>'|'!='|
'in'|'not' 'in'|'is'|'is' 'not'
comparison: expr (comp_op expr)*
not_test: 'not' not_test | comparison
and_test: not_test ('and' not_test)*
or_test: and_test ('or' and_test)*
test:
or_test ['if' or_test 'else' test] | lambdef
testlist:
test (',' test)* [',']
exprlist:
(expr|star_expr) (',' (expr|star_expr))* [',']
testlist_comp:
(test|star_expr) (
comp_for | (',' (test|star_expr))* [','] )
atom:
('(' [yield_expr|testlist_comp] ')' |
'[' [testlist_comp] ']' |
'{' [dictorsetmaker] '}' |
NAME | NUMBER | STRING+ |
'...' | 'None' | 'True' | 'False')
augassign:
('+=' | '-=' | '*=' | '/=' | '%=' |
'&=' | '|=' | '^=' | '<<=' | '>>=' |
'**=' | '//=')
testlist_star_expr:
(test|star_expr) (',' (test|star_expr))* [',']
expr_stmt:
testlist_star_expr (
augassign (yield_expr|testlist) |
('=' (yield_expr|testlist_star_expr))*)
pass_stmt: 'pass'
small_stmt:
(expr_stmt | del_stmt | pass_stmt | flow_stmt |
import_stmt | global_stmt | nonlocal_stmt |
assert_stmt)
simple_stmt:
small_stmt (';' small_stmt)* [';'] NEWLINE
suite: simple_stmt | NEWLINE INDENT stmt+ DEDENT
for_stmt:
'for' exprlist 'in' testlist ':' suite ['else' ':' suite]
compound_stmt:
if_stmt | while_stmt | for_stmt | try_stmt |
with_stmt | funcdef | classdef | decorated
stmt:
simple_stmt | compound_stmt
eval_input:
testlist NEWLINE* ENDMARKER
file_input:
(NEWLINE | stmt)* ENDMARKER
single_input:
NEWLINE | simple_stmt | compound_stmt NEWLINE
A possible match for atom is a set of empty parentheses, which defines
an empty tuple. Another match is a testlist_comp in parentheses. In
this case, if there's only one test/star_expr, then a trailing comma
is required to define a tuple.
The final parse tree depends on where you "start". For example, the
interactive shell uses single_input mode, so it looks for either
NEWLINE (i.e. you hit <enter> at a blank prompt), a simple_stmt, or
compound_stmt NEWLINE. If you enter a compound statement such as a
for_stmt, you have to hit enter on an empty line in order to finish
parsing, compile, and execute the statement. Note also that eval('')
is a syntax error because there's nothing to match testlist in
eval_input mode.
The parser start constants are defined in Include/graminit.h:
#define single_input 256
#define file_input 257
#define eval_input 258
The corresponding C API constants are defined in Include/compile.h:
#define Py_single_input 256
#define Py_file_input 257
#define Py_eval_input 258
For the built-in compile() function these constants map to the modes
'single', 'exec', and 'eval'.
The parser module has a couple of functions that create syntax trees:
parser.expr (eval_input mode) and parser.suite (file_input mode). The
resulting st objects have tolist() and totuple() methods, but the
resulting nested sequences of numeric constants aren't all that
helpful for interactive inspection. I scribbled a little code to
pretty print them using the corresponding names from the tokenize and
symbol modules.
import parser
from pprint import pprint
from tokenize import tok_name
from symbol import sym_name
stnames = dict(tok_name)
stnames.update(sym_name)
def stnumtoname(st):
if isinstance(st, parser.STType):
st = st.tolist()
stnew = [stnames[st[0]]]
for item in st[1:]:
if isinstance(item, (list, tuple)):
stnew.append(stnumtoname(item))
elif isinstance(item, int):
stnew.append(stnames[item])
else:
stnew.append(item)
return stnew
def pprintst(st, **kwds):
if 'width' not in kwds:
kwds['width'] = 60
pprint(stnumtoname(st), **kwds)
Here's an example tree for the compound statement "for i in (0,): pass":
>>> pprintst(parser.suite('for i in (0,): pass'))
['file_input',
['stmt',
['compound_stmt',
['for_stmt',
['NAME', 'for'],
['exprlist',
['expr',
['xor_expr',
['and_expr',
['shift_expr',
['arith_expr',
['term',
['factor',
['power', ['atom', ['NAME', 'i']]]]]]]]]]],
['NAME', 'in'],
['testlist',
['test',
['or_test',
['and_test',
['not_test',
['comparison',
['expr',
['xor_expr',
['and_expr',
['shift_expr',
['arith_expr',
['term',
['factor',
['power',
['atom',
['LPAR', '('],
['testlist_comp',
['test',
['or_test',
['and_test',
['not_test',
['comparison',
['expr',
['xor_expr',
['and_expr',
['shift_expr',
['arith_expr',
['term',
['factor',
['power',
['atom',
['NUMBER',
'0']]]]]]]]]]]]]]],
['COMMA', ',']],
['RPAR', ')']]]]]]]]]]]]]]]],
['COLON', ':'],
['suite',
['simple_stmt',
['small_stmt', ['pass_stmt', ['NAME', 'pass']]],
['NEWLINE', '']]]]]],
['NEWLINE', ''],
['ENDMARKER', '']]
If you have a debug build of CPython, use the -d command-line option
to see the parser at work. For example here's the [abridged] output
for parsing "(0,)":
>>> (0,)
Token LPAR/'(' ... It's a token we know
DFA 'single_input', state 0: Push ...
DFA 'simple_stmt', state 0: Push ...
DFA 'small_stmt', state 0: Push ...
DFA 'expr_stmt', state 0: Push ...
DFA 'testlist_star_expr', state 0: Push ...
DFA 'test', state 0: Push ...
[snip]
DFA 'atom', state 0: Shift.
Token NUMBER/'0' ... It's a token we know
DFA 'atom', state 1: Push ...
DFA 'testlist_comp', state 0: Push ...
DFA 'test', state 0: Push ...
[snip]
DFA 'atom', state 0: Shift.
DFA 'atom', state 4: Direct pop.
Token COMMA/',' ... It's a token we know
DFA 'power', state 1: Pop ...
DFA 'factor', state 2: Pop ...
[snip]
DFA 'test', state 1: Pop ...
DFA 'testlist_comp', state 1: Shift.
Token RPAR/')' ... It's a token we know
DFA 'testlist_comp', state 3: Pop ...
DFA 'atom', state 6: Shift.
DFA 'atom', state 4: Direct pop.
Token NEWLINE/'' ... It's a token we know
DFA 'power', state 1: Pop ...
[snip]
DFA 'test', state 1: Pop ...
DFA 'testlist_star_expr', state 1: Pop ...
DFA 'expr_stmt', state 1: Pop ...
DFA 'small_stmt', state 1: Pop ...
DFA 'simple_stmt', state 1: Shift.
DFA 'simple_stmt', state 3: Direct pop.
DFA 'single_input', state 1: Direct pop.
ACCEPT.
(0,)
The concrete syntax tree is only the first stage. The next step is to
transform the CST into an AST (abstract syntax tree), constructed
according to the following abstract syntax description (ASDL):
http://hg.python.org/cpython/file/3.3/Parser/Python.asdl
Use ast.parse() and ast.dump() to build and inspect the AST for source
code (it defaults to 'exec' mode):
>>> print(ast.dump(ast.parse('for i in (0,): pass')))
Module(body=[For(target=Name(id='i', ctx=Store()),
iter=Tuple(elts=[Num(n=0)], ctx=Load()),
body=[Pass()], orelse=[])])
More information about the Tutor
mailing list