Regex: Parsing Lisp with Python

Thu Aug 8 16:51:13 EDT 2002

Not what the original poster wanted, but oh well...

This appears to be a functional LISP parser using the CVS version of 
SimpleParse 2.0.0 (though it's been so long since I used LISP I can't be 
sure it's entirely correct.  As an example, the results of parsing:

	'''("this\n\r" ' those (+ a b) (23s 0xa3 55.3) "s")'''

(specified as a Python string) is as follows:

[('list',
   0,
   46,
   [('string_double_quote', 1, 9, [('char_no_quote', 2, 8, [])]),
    ('quote', 10, 11, []),
    ('name', 12, 17, []),
    ('list',
     18,
     25,
     [('name', 19, 20, []), ('name', 21, 22, []), ('name', 23, 24, [])]
    ('list',
     26,
     41,
     [('name', 27, 30, []),
      ('number_expr',
       31,
       35,
       [('number',
         31,
         35,
         [('hex', 31, 35, [('hexdigits', 33, 35, [])])])]),
      ('number_expr',
       36,
       40,
       [('number',
         36,
         40,
         [('float',
           36,
           40,
           [('explicit_base',
             36,
             40,
             [('int_unsigned', 36, 38, []),
              ('decimal_fraction',
               38,
               40,
               [('int_unsigned', 39, 40, [])])])])])])]),
    ('string_double_quote', 42, 45, [('char_no_quote', 43, 44, [])])])]

Enjoy,
Mike

"""Basic LISP parser adapted from the YAPPS documentation's sample

We use shortcuts, so we get " strings, float, int, and hex
atoms, as well as regular list objects.  Note: Lisp doesn't
appear to use , for seperating atoms in lists, not sure if
that's just a feature of the YAPPS version or not.
"""

definition = r"""
### A LISP parser based on a parser in YAPPS documentation

<ts>        := [ \t\n\r]*
<nameChar>  := [-+*/!@%^&=.a-zA-Z0-9_]
quote       := "'"
name        := nameChar+
 >atom<      := quote / string_double_quote / list / number_expr / name

# numbers are regular number values followed
# by something that is _not_ a nameCharacter
number_expr := number, ?-(nameChar)
list        := "(", seq?, ")"
 >seq<       := ts, atom, (ts,atom)*, ts
"""
from simpleparse.parser import Parser
from simpleparse.common import strings, numbers
from simpleparse.dispatchprocessor import *

parser = Parser( definition, 'atom' )

Paul Rubin wrote:
> Thomas Guettler <zopestoller at thomas-guettler.de> writes:
> 
>>I tried it like this, but this gives me all tokens
>>serialized. It is hard to get the second symbol without
>>counting all open and close tokens. Is there a way to get
>>the tokens in nested lists?
> 
> 
> No there's no way to do that with traditional regexps.
> You have to parse the s-expressions.  Normally you do that with
> recursion: on seeing an open-paren, parse additional s-expressions 
> til you see a close-paren, and make a list of them.
> 
> You might look at source code of some lisp interpreters to see how
> this works.  SIOD (Scheme In One Day) is a nice simple one written in
> C, that you can probably find on Google.

-- 
_______________________________________
   Mike C. Fletcher
   Designer, VR Plumber, Coder
   http://members.rogers.com/mcfletch/