Pyparsing: Grammar Suggestion

Wed May 17 11:53:17 EDT 2006

Am Mittwoch 17 Mai 2006 17:24 schrieb Khoa Nguyen:
> Any suggestions?

If you're not limited to PyParsing, pyrr.ltk/ptk might be appropriate for you 
here (if you're used to bison/flex). The following file implements a small 
sample lexer/parser which does exactly what you need. pyrr.ltk (the lexing 
toolkit) is stable, but pyrr.ptk isn't yet, but it's nevertheless available 
under:

http://hg.modelnine.org/hg/pyrr

as a mercurial repository. I'd advise you to take the version from the 
repository, if you're interested in it, as my packaged versions always had 
quirks, which the current head of the repository doesn't, AFAICT.

Anyway, the following implements the parser/lexer for you:

>>>
from pyrr.ltk import LexerBase, IgnoreMatch
from pyrr.ptk import ParserBase

class SampleLexer(LexerBase):

    def f(self,match,data):
        r"""
        f1 [10]-> /f1/
        f2 [10]-> /f2/
        f3 [10]-> /f3/
        f4 [10]-> /f4/
        f5 [10]-> /f5/
        f6 [10]-> /f6/

        Create your specific matches for each of the fs here...
        """

        return data

    def fid(self,match,data):
        r"""
        fid -> ri/[a-z_][a-z0-9_]*/

        Match a record identifier.
        """

        return data

    def end_of_record(self,match,data):
        r"""
        EOR -> /END_OF_RECORD/

        Your end of record marker...
        """

    def operators(self,match,data):
        r"""
        nl -> e/\n/
        c  -> /,/
        eq -> /=/

        Newline is something that I have inserted here...
        """

    def ws(self,match,data):
        r"""
        ws -> r/\s+/

        Ignore all whitespace that occurs somewhere in the input.
        """

        raise IgnoreMatch

class SampleParser(ParserBase):
    __start__ = "ifile"

    def ifile(self,data):
        """
        ifile -> record+
        """

        return dict(data)

    def record(self,fid,eq,f1,c1,f2,c2,f3,c3,f4,c4,f5,c5,f6,eor,nl):
        """

record -> /fid/ /eq/ /f1/? /c/ /f2/? /c/ /f3/? /c/ /f4/? /c/ /f5/? /c/ /f6/? /EOR/ /nl/
        """

        return (fid,(f1,f2,f3,f4,f5,f6))

data = r"""recmark = f1,f2,,f4,f5,f6 END_OF_RECORD
recmark2 = f1,f2,f3,f4,,f6 END_OF_RECORD
"""

print SampleParser.parse(SampleLexer(data))
>>>

HTH!

--- Heiko.