Any help with PLY?

Thu Nov 17 14:52:59 EST 2005

<mark.green at reading.ac.uk> wrote in message
news:1132253408.676406.179100 at g43g2000cwa.googlegroups.com...
> Hi folks,
>
> I've been trying to write a PLY parser and have run into a bit of
> bother.
>
> At the moment, I have a RESERVEDWORD token which matches all reserved
> words and then alters the token type to match the reserved word that
> was detected.  I also have an IDENTIFIER token which matches
> identifiers that are not reserved words.
>
> The problem is, if I put RESERVEDWORD before IDENTIFIER, then
> identifiers that happen to begin with reserved words are wrongly lexed
> as the reserved word followed by an identifier.  For example, because
> "if" is a RESERVEDWORD, the string "ifollowyou" is wrongly lexed as the
> RESERVEDWORD "if" followed by IDENTIFIER "ollowyou", rather than just
> as the IDENTIFIER "ifollowyou".
>
> If I put IDENTIFIER first, though, every single reserved word in the
> input is lexed as an IDENTIFIER.
>
> Is there any way I can tell PLY that it should only return a
> RESERVEDWORD in the correct circumstances?  If PLY can't do this, can
> any of the other Python parser generators?  (It seems that Lex can..)
>
> Thanks!
>
Pyparsing uses the Keyword class for just this purpose.  Before Keyword was
added to pyparsing, one had to solve this problem using the Or operator,
which performs a longest string or "greedy" match, as in :

        any_       = Literal("any")
        boolean_   = Literal("boolean")
        char_      = Literal("char")
        double_    = Literal("double")
        ...

        identifier = Word( alphas, alphanums + "_" ).setName("identifier")

        real = Combine( Word(nums+"+-", nums) + dot + Optional( Word(nums) )
                        + Optional( CaselessLiteral("E") +
Word(nums+"+-",nums) ) )
        integer = ( Combine( CaselessLiteral("0x") + Word(
nums+"abcdefABCDEF" ) ) |
                    Word( nums+"+-", nums ) ).setName("int")

        udTypeName = delimitedList( identifier, "::",
combine=True ).setName("udType")

        # have to use longest match for type, in case a user-defined
        # type name starts with a keyword type, like "stringSeq" or
"longArray"
        typeName = ( any_ ^ boolean_ ^ char_ ^ double_ ^ fixed_ ^
                    float_ ^ long_ ^ octet_ ^ short_ ^ string_ ^
                    wchar_ ^ wstring_ ^ udTypeName )

This way, if a user-defined type was named "stringSequence" the longest
matching expression would be returned.

Pyparsing also has a MatchFirst alternative matcher, using the '|' operator,
which returns the first matching expression regardless of length.
Predictably, MatchFirst is faster at parsing, since it does not need to
evaluate every path - it can just return the first matching expression.  Now
with Keyword, I can define:

        any_       = Keyword("any")
        boolean_   = Keyword("boolean")
        char_      = Keyword("char")
        double_    = Keyword("double")
        ...
        typeName = ( any_ | boolean_ | char_ | double_ | fixed_ |
                    float_ | long_ | octet_ | short_ | string_ |
                    wchar_ | wstring_ | udTypeName )

Does PLY support greedy matching?

-- Paul
(Download pyparsing at http://pyparsing.sourceforge.net .)