Regex: Parsing Lisp with Python
Mike C. Fletcher
mcfletch at rogers.com
Thu Aug 8 16:51:13 EDT 2002
Not what the original poster wanted, but oh well...
This appears to be a functional LISP parser using the CVS version of
SimpleParse 2.0.0 (though it's been so long since I used LISP I can't be
sure it's entirely correct. As an example, the results of parsing:
'''("this\n\r" ' those (+ a b) (23s 0xa3 55.3) "s")'''
(specified as a Python string) is as follows:
[('list',
0,
46,
[('string_double_quote', 1, 9, [('char_no_quote', 2, 8, [])]),
('quote', 10, 11, []),
('name', 12, 17, []),
('list',
18,
25,
[('name', 19, 20, []), ('name', 21, 22, []), ('name', 23, 24, [])]
('list',
26,
41,
[('name', 27, 30, []),
('number_expr',
31,
35,
[('number',
31,
35,
[('hex', 31, 35, [('hexdigits', 33, 35, [])])])]),
('number_expr',
36,
40,
[('number',
36,
40,
[('float',
36,
40,
[('explicit_base',
36,
40,
[('int_unsigned', 36, 38, []),
('decimal_fraction',
38,
40,
[('int_unsigned', 39, 40, [])])])])])])]),
('string_double_quote', 42, 45, [('char_no_quote', 43, 44, [])])])]
Enjoy,
Mike
"""Basic LISP parser adapted from the YAPPS documentation's sample
We use shortcuts, so we get " strings, float, int, and hex
atoms, as well as regular list objects. Note: Lisp doesn't
appear to use , for seperating atoms in lists, not sure if
that's just a feature of the YAPPS version or not.
"""
definition = r"""
### A LISP parser based on a parser in YAPPS documentation
<ts> := [ \t\n\r]*
<nameChar> := [-+*/!@%^&=.a-zA-Z0-9_]
quote := "'"
name := nameChar+
>atom< := quote / string_double_quote / list / number_expr / name
# numbers are regular number values followed
# by something that is _not_ a nameCharacter
number_expr := number, ?-(nameChar)
list := "(", seq?, ")"
>seq< := ts, atom, (ts,atom)*, ts
"""
from simpleparse.parser import Parser
from simpleparse.common import strings, numbers
from simpleparse.dispatchprocessor import *
parser = Parser( definition, 'atom' )
Paul Rubin wrote:
> Thomas Guettler <zopestoller at thomas-guettler.de> writes:
>
>>I tried it like this, but this gives me all tokens
>>serialized. It is hard to get the second symbol without
>>counting all open and close tokens. Is there a way to get
>>the tokens in nested lists?
>
>
> No there's no way to do that with traditional regexps.
> You have to parse the s-expressions. Normally you do that with
> recursion: on seeing an open-paren, parse additional s-expressions
> til you see a close-paren, and make a list of them.
>
> You might look at source code of some lisp interpreters to see how
> this works. SIOD (Scheme In One Day) is a nice simple one written in
> C, that you can probably find on Google.
--
_______________________________________
Mike C. Fletcher
Designer, VR Plumber, Coder
http://members.rogers.com/mcfletch/
More information about the Python-list
mailing list