[Tutor] Fwd: Difference between types

eryksun eryksun at gmail.com
Sun May 26 06:39:36 CEST 2013


On Fri, May 24, 2013 at 2:53 PM, Albert-Jan Roskam <fomcl at yahoo.com> wrote:
> Why do I need to use a trailing comma to create a singleton
> tuple? Without a comma it seems to mean "parenthesized single
> object", ie the parentheses are basically not there.

Here are some technical notes and references to augment the other
answers. I think this is on topic in this thread. Even though the
title is "Difference between types", I think Citizen Kant is mostly
concerned with tokenizing and parsing source code.

Python 3.3's grammar definition:

http://hg.python.org/cpython/file/3.3/Grammar/Grammar

Here's an incomplete subset:

  power: atom trailer* ['**' factor]
  factor: ('+'|'-'|'~') factor | power
  term: factor (('*'|'/'|'%'|'//') factor)*
  arith_expr: term (('+'|'-') term)*
  shift_expr: arith_expr (('<<'|'>>') arith_expr)*
  and_expr: shift_expr ('&' shift_expr)*
  xor_expr: and_expr ('^' and_expr)*
  expr: xor_expr ('|' xor_expr)*
  star_expr: '*' expr

  comp_op:
    '<'|'>'|'=='|'>='|'<='|'<>'|'!='|
    'in'|'not' 'in'|'is'|'is' 'not'

  comparison: expr (comp_op expr)*
  not_test: 'not' not_test | comparison
  and_test: not_test ('and' not_test)*
  or_test: and_test ('or' and_test)*
  test:
    or_test ['if' or_test 'else' test] | lambdef

  testlist:
    test (',' test)* [',']

  exprlist:
    (expr|star_expr) (',' (expr|star_expr))* [',']

  testlist_comp:
    (test|star_expr) (
      comp_for | (',' (test|star_expr))* [','] )

  atom:
    ('(' [yield_expr|testlist_comp] ')' |
     '[' [testlist_comp] ']' |
     '{' [dictorsetmaker] '}' |
     NAME | NUMBER | STRING+ |
     '...' | 'None' | 'True' | 'False')

  augassign:
    ('+=' | '-=' | '*=' | '/=' | '%=' |
     '&=' | '|=' | '^=' | '<<=' | '>>=' |
     '**=' | '//=')

  testlist_star_expr:
    (test|star_expr) (',' (test|star_expr))* [',']

  expr_stmt:
    testlist_star_expr (
      augassign (yield_expr|testlist) |
      ('=' (yield_expr|testlist_star_expr))*)

  pass_stmt: 'pass'

  small_stmt:
    (expr_stmt | del_stmt | pass_stmt | flow_stmt |
     import_stmt | global_stmt | nonlocal_stmt |
     assert_stmt)

  simple_stmt:
    small_stmt (';' small_stmt)* [';'] NEWLINE

  suite: simple_stmt | NEWLINE INDENT stmt+ DEDENT

  for_stmt:
    'for' exprlist 'in' testlist ':' suite ['else' ':' suite]

  compound_stmt:
    if_stmt | while_stmt | for_stmt | try_stmt |
    with_stmt | funcdef | classdef | decorated

  stmt:
    simple_stmt | compound_stmt

  eval_input:
    testlist NEWLINE* ENDMARKER

  file_input:
    (NEWLINE | stmt)* ENDMARKER

  single_input:
    NEWLINE | simple_stmt | compound_stmt NEWLINE

A possible match for atom is a set of empty parentheses, which defines
an empty tuple. Another match is a testlist_comp in parentheses. In
this case, if there's only one test/star_expr, then a trailing comma
is required to define a tuple.

The final parse tree depends on where you "start". For example, the
interactive shell uses single_input mode, so it looks for either
NEWLINE (i.e. you hit <enter> at a blank prompt), a simple_stmt, or
compound_stmt NEWLINE. If you enter a compound statement such as a
for_stmt, you have to hit enter on an empty line in order to finish
parsing, compile, and execute the statement. Note also that eval('')
is a syntax error because there's nothing to match testlist in
eval_input mode.

The parser start constants are defined in Include/graminit.h:

    #define single_input 256
    #define file_input 257
    #define eval_input 258

The corresponding C API constants are defined in Include/compile.h:

    #define Py_single_input 256
    #define Py_file_input 257
    #define Py_eval_input 258

For the built-in compile() function these constants map to the modes
'single', 'exec', and 'eval'.

The parser module has a couple of functions that create syntax trees:
parser.expr (eval_input mode) and parser.suite (file_input mode). The
resulting st objects have tolist() and totuple() methods, but the
resulting nested sequences of numeric constants aren't all that
helpful for interactive inspection. I scribbled a little code to
pretty print them using the corresponding names from the tokenize and
symbol modules.

    import parser
    from pprint import pprint
    from tokenize import tok_name
    from symbol import sym_name

    stnames = dict(tok_name)
    stnames.update(sym_name)

    def stnumtoname(st):
        if isinstance(st, parser.STType):
            st = st.tolist()
        stnew = [stnames[st[0]]]
        for item in st[1:]:
            if isinstance(item, (list, tuple)):
                stnew.append(stnumtoname(item))
            elif isinstance(item, int):
                stnew.append(stnames[item])
            else:
                stnew.append(item)
        return stnew

    def pprintst(st, **kwds):
        if 'width' not in kwds:
            kwds['width'] = 60
        pprint(stnumtoname(st), **kwds)


Here's an example tree for the compound statement "for i in (0,): pass":

    >>> pprintst(parser.suite('for i in (0,): pass'))
    ['file_input',
     ['stmt',
      ['compound_stmt',
       ['for_stmt',
        ['NAME', 'for'],
        ['exprlist',
         ['expr',
          ['xor_expr',
           ['and_expr',
            ['shift_expr',
             ['arith_expr',
              ['term',
               ['factor',
                ['power', ['atom', ['NAME', 'i']]]]]]]]]]],
        ['NAME', 'in'],
        ['testlist',
         ['test',
          ['or_test',
           ['and_test',
            ['not_test',
             ['comparison',
              ['expr',
               ['xor_expr',
                ['and_expr',
                 ['shift_expr',
                  ['arith_expr',
                   ['term',
                    ['factor',
                     ['power',
                      ['atom',
                       ['LPAR', '('],
                       ['testlist_comp',
                        ['test',
                         ['or_test',
                          ['and_test',
                           ['not_test',
                            ['comparison',
                             ['expr',
                              ['xor_expr',
                               ['and_expr',
                                ['shift_expr',
                                 ['arith_expr',
                                  ['term',
                                   ['factor',
                                    ['power',
                                     ['atom',
                                      ['NUMBER',
                                       '0']]]]]]]]]]]]]]],
                        ['COMMA', ',']],
                       ['RPAR', ')']]]]]]]]]]]]]]]],
        ['COLON', ':'],
        ['suite',
         ['simple_stmt',
          ['small_stmt', ['pass_stmt', ['NAME', 'pass']]],
          ['NEWLINE', '']]]]]],
     ['NEWLINE', ''],
     ['ENDMARKER', '']]


If you have a debug build of CPython, use the -d command-line option
to see the parser at work. For example here's the [abridged] output
for parsing "(0,)":

    >>> (0,)
    Token LPAR/'(' ... It's a token we know
     DFA 'single_input', state 0: Push ...
     DFA 'simple_stmt', state 0: Push ...
     DFA 'small_stmt', state 0: Push ...
     DFA 'expr_stmt', state 0: Push ...
     DFA 'testlist_star_expr', state 0: Push ...
     DFA 'test', state 0: Push ...
    [snip]
     DFA 'atom', state 0: Shift.

    Token NUMBER/'0' ... It's a token we know
     DFA 'atom', state 1: Push ...
     DFA 'testlist_comp', state 0: Push ...
     DFA 'test', state 0: Push ...
    [snip]
     DFA 'atom', state 0: Shift.
      DFA 'atom', state 4: Direct pop.

    Token COMMA/',' ... It's a token we know
     DFA 'power', state 1: Pop ...
     DFA 'factor', state 2: Pop ...
    [snip]
     DFA 'test', state 1: Pop ...
     DFA 'testlist_comp', state 1: Shift.
    Token RPAR/')' ... It's a token we know
     DFA 'testlist_comp', state 3: Pop ...
     DFA 'atom', state 6: Shift.
      DFA 'atom', state 4: Direct pop.

    Token NEWLINE/'' ... It's a token we know
     DFA 'power', state 1: Pop ...
    [snip]
     DFA 'test', state 1: Pop ...
     DFA 'testlist_star_expr', state 1: Pop ...
     DFA 'expr_stmt', state 1: Pop ...
     DFA 'small_stmt', state 1: Pop ...
     DFA 'simple_stmt', state 1: Shift.
      DFA 'simple_stmt', state 3: Direct pop.
      DFA 'single_input', state 1: Direct pop.
      ACCEPT.
    (0,)

The concrete syntax tree is only the first stage. The next step is to
transform the CST into an AST (abstract syntax tree), constructed
according to the following abstract syntax description (ASDL):

http://hg.python.org/cpython/file/3.3/Parser/Python.asdl

Use ast.parse() and ast.dump() to build and inspect the AST for source
code (it defaults to 'exec' mode):

    >>> print(ast.dump(ast.parse('for i in (0,): pass')))
    Module(body=[For(target=Name(id='i', ctx=Store()),
    iter=Tuple(elts=[Num(n=0)], ctx=Load()),
    body=[Pass()], orelse=[])])


More information about the Tutor mailing list