Python parser that records source ranges
Paul Paterson
paulpaterson at users.sourceforge.net
Wed Oct 1 23:39:31 EDT 2003
"Jonathan Edwards" <edwards at nospam.lcs.mit.edu> wrote in message
news:qRKdb.456249$Oz4.260848 at rwcrnsc54...
> The parser library module only records source line numbers for tokens. I
> need a parser that records ranges of line and character locations for
> each AST node, so I can map back to the source. Does anyone know of such
> a thing? Thanks
>
> Jonathan
>
If I understand you correctly, then the Simpleparse parser may be just what
you are looking for:
http://simpleparse.sourceforge.net
It is very powerful but still easy to use. The AST it produces gives the
start and end points of the matching tokens. Below is an example for parsing
a statement (from a VB grammar) ... you will see each node comprises a tuple
of (token_name, start_char, end_char, [sub_node1, sub_node2, ...]).
The example below looks rather complex because of the grammar, but you can
see that most of the sub_node matches all relate to the same characters in
the source. You can easily match each token to the corresponding text in the
source.
Paul
>>> c("a = f(20, val)", verbose=1)
1 15
[('line_body',
0,
15,
[('single_statement',
0,
14,
[('assignment_statement',
0,
14,
[('object', 0, 1, [('primary', 0, 1, [('identifier', 0, 1, [])])]),
('expression',
4,
14,
[('par_expression',
4,
14,
[('base_expression',
4,
14,
[('simple_expr',
4,
14,
[('call',
4,
14,
[('object',
4,
14,
[('primary',
4,
5,
[('identifier', 4, 5, [])]),
('parameter_list',
5,
14,
[('list',
5,
14,
[('bare_list',
6,
13,
[('bare_list_item',
6,
8,
[('expression',
6,
8,
[('par_expression',
6,
8,
[('base_expression',
6,
8,
[('simple_expr',
6,
8,
[('atom',
6,
8,
[('literal',
6,
8,
[('integer',
6,
8,
[('decimalinteger',
6,
8,
None)])])])])])])])]),
('bare_list_item',
10,
13,
[('expression',
10,
13,
[('par_expression',
10,
13,
[('base_expression',
10,
13,
[('simple_expr',
10,
13,
[('call',
10,
13,
[('object',
10,
13,
[('primary',
10,
13,
[('identifier',
10,
13,
[])])])])])])])])])])])])])])])])])])])]),
('line_end', 14, 15, [('NEWLINE', 14, 15, None)])])]
More information about the Python-list
mailing list