[Tutor] token parser

Sun Feb 11 13:54:30 CET 2007

Dj Gilcrease wrote:
> How would I go about writing a fast token parser to parse a string like
> "[4d6.takeHighest(3)+(2d6*3)-5.5]"
> 
> and get a list like
> ['+',
>     ['takeHighest',
>         ['d',
>             4,
>             6
>         ],
>         3
>     ],
>     ['-',
>         ['*',
>             ['d',
>                 2,
>                 6
>             ],
>             3
>         ],
>         5.5
>     ]
> ]
> 
> back? ( I put it all separated and indented like that so it is easier
> to read, it is for me anyways )

If your input is valid Python (which the above is not, 4d6 and 2d6 are 
not valid identifiers) then perhaps the compiler.parse() function would 
be a good starting point. It generates an abstract syntax tree which you 
could perhaps transform into the format you want:

In [13]: import compiler

In [19]: compiler.parse("[d6.takeHighest(3)+(d6*3)-5.5]")

Out[19]: Module(None, 
Stmt([Discard(List([Sub((Add((CallFunc(Getattr(Name('d6'), 
'takeHighest'), [Const(3)], None, None), Mul((Name('d6'), Const(3))))
), Const(5.5)))]))]))

If this doesn't work for you, then I would look to one of the many 
parser-generator packages available for Python. I don't know which is 
fastest; I have found pyparsing and PLY to be fairly easy to use. 
pyparsing comes with a lot of examples which might help you get started. 
Here are some summaries of the options:
http://www.nedbatchelder.com/text/python-parsers.html
http://wiki.python.org/moin/LanguageParsing
http://radio.weblogs.com/0100945/2004/04/24.html

Here is an article that gives some examples:
http://www.rexx.com/~dkuhlman/python_201/python_201.html#SECTION007000000000000000000
http://www-128.ibm.com/developerworks/linux/library/l-cpdpars.html?ca=dgr-lnxw02DParser
and the references in the above

Kent