regex help for a newbie

Diez B. Roggisch deetsNOSPAM at web.de
Fri Apr 9 13:05:35 EDT 2004


> I need to do this by real parsing. In fact the solution from Diez isn't
> enough. I will have to write a much more flexible parser, as I realized.

Why not? If all you need is to extract that parenthesized structure, a
self-written parser should be the easiest. Consider this:

import re

def parse(sg):
    res = []
    for c in sg:
        if c == "%(":
            res.append(parse(sg))
        elif c == ")":
            return res
        else:
            res.append(c)
    return res


def sgen(s):
    rex = re.compile(r"(%\(|\))") 
    for token in rex.split(s):
        yield token


print parse(sgen("%(BBB%(CCC)BBB)"))

> 
> Diez mentioned spark as a parser. I also found yappy, which is a parser
> generator. I have not much experience with parsers. What is the
> difference between these two? When should one use the one, when the
> other?

yappy is a lr(1) parser, and spark is a earley parser. Bont of them are
suited for your problem. 

I personally found spark easy to use, as its very declarative - but I don't
know yappy, maybe thats cool, to.

-- 
Regards,

Diez B. Roggisch



More information about the Python-list mailing list