pyparsing

Paul McGuire ptmcg at austin.rr._bogus_.com
Thu May 13 15:40:39 EDT 2004


"Bo¹tjan Jerko" <bostjan.jerko at mf.uni-lj.si> wrote in message
news:87fza46evn.fsf at bostjan-pc.mf.uni-lj.si...
> Hello !
>
> I am trying to understand pyparsing. Here is a little test program
> to check Optional subclass:
>
> from pyparsing import Word,nums,Literal,Optional
>
> lbrack=Literal("[").suppress()
> rbrack=Literal("]").suppress()
> ddot=Literal(":").suppress()
> start = Word(nums+".")
> step = Word(nums+".")
> end = Word(nums+".")
>
> sequence=lbrack+start+Optional(ddot+step)+ddot+end+rbrack
>
> tokens = sequence.parseString("[0:0.1:1]")
> print tokens
>
> tokens1 = sequence.parseString("[1:2]")
> print tokens1
>
> It works on tokens, but the error message is showed on
> the second string ("[1:2]"). I don't get it. I did use
> Optional for ddot and step so I guess they are optional.
>
> Any hints what I am doing wrong?
>
> The versions are pyparsing 1.1.2 and Python 2.3.3.
>
> Thanks,
>
> B.
Bostjan -

Here's how pyparsing is processing your input strings:

[0:0.1:1]
[ = lbrack
0 = start
:0.1 = ddot + step (Optional match)
: = ddot
1 = end
] = rbrack

[1:2]
[ = lbrack
1 = start
:2 = ddot + step (Optional match)
]  = oops! expected ddot -> failure


Dang Griffith proposed one alternative construct, here's another, perhaps
more explicit:
    lbrack + ( ( ddot + step + ddot + end ) | (ddot + end) ) + rbrack

Note that the order of the inner construct is important, so as to not match
ddot+end before trying ddot+step+ddot+end; '|' is a greedy matching
operator, creating a MatchFirst object from pyparsing's class library.  You
could avoid this confusion by using '^', which generates an Or object:
    lbrack + ( (ddot + end) ^ ( ddot + step + ddot + end ) ) + rbrack
This will evaluate both subconstructs, and choose the longer of the two.

Or you can use another pyparsing helper, the delimited list
    lbrack + delimitedlist( Word(nums+"."), delim=":") + rbrack
This implicitly suppresses delimiters, so that all you will get back are
["1","0.1","1"] in the first case and ["1","2"] in the second.

Happy pyparsing!
-- Paul





More information about the Python-list mailing list