Python parser that records source ranges

Paul Paterson paulpaterson at users.sourceforge.net
Wed Oct 1 23:39:31 EDT 2003


"Jonathan Edwards" <edwards at nospam.lcs.mit.edu> wrote in message
news:qRKdb.456249$Oz4.260848 at rwcrnsc54...
> The parser library module only records source line numbers for tokens. I
> need a parser that records ranges of line and character locations for
> each AST node, so I can map back to the source. Does anyone know of such
> a thing? Thanks
>
> Jonathan
>

If I understand you correctly, then the Simpleparse parser may be just what
you are looking for:

http://simpleparse.sourceforge.net

It is very powerful but still easy to use. The AST it produces gives the
start and end points of the matching tokens. Below is an example for parsing
a statement (from a VB grammar) ... you will see each node comprises a tuple
of (token_name, start_char, end_char, [sub_node1, sub_node2, ...]).

The example below looks rather complex because of the grammar, but you can
see that most of the sub_node matches all relate to the same characters in
the source. You can easily match each token to the corresponding text in the
source.

Paul

>>> c("a = f(20, val)", verbose=1)
1 15
[('line_body',
  0,
  15,
  [('single_statement',
    0,
    14,
    [('assignment_statement',
      0,
      14,
      [('object', 0, 1, [('primary', 0, 1, [('identifier', 0, 1, [])])]),
       ('expression',
        4,
        14,
        [('par_expression',
          4,
          14,
          [('base_expression',
            4,
            14,
            [('simple_expr',
              4,
              14,
              [('call',
                4,
                14,
                [('object',
                  4,
                  14,
                  [('primary',
                    4,
                    5,
                    [('identifier', 4, 5, [])]),
                   ('parameter_list',
                    5,
                    14,
                    [('list',
                      5,
                      14,
                      [('bare_list',
                        6,
                        13,
                        [('bare_list_item',
                          6,
                          8,
                          [('expression',
                            6,
                            8,
                            [('par_expression',
                              6,
                              8,
                              [('base_expression',
                                6,
                                8,
                                [('simple_expr',
                                  6,
                                  8,
                                  [('atom',
                                    6,
                                    8,
                                    [('literal',
                                      6,
                                      8,
                                      [('integer',
                                        6,
                                        8,
                                        [('decimalinteger',
                                          6,
                                          8,
                                          None)])])])])])])])]),
                         ('bare_list_item',
                          10,
                          13,
                          [('expression',
                            10,
                            13,
                            [('par_expression',
                              10,
                              13,
                              [('base_expression',
                                10,
                                13,
                                [('simple_expr',
                                  10,
                                  13,
                                  [('call',
                                    10,
                                    13,
                                    [('object',
                                      10,
                                      13,
                                      [('primary',
                                        10,
                                        13,
                                        [('identifier',
                                          10,
                                          13,

[])])])])])])])])])])])])])])])])])])])]),
   ('line_end', 14, 15, [('NEWLINE', 14, 15, None)])])]






More information about the Python-list mailing list