Parser Generator?

Paul McGuire ptmcg at austin.rr.com
Sun Aug 26 23:26:32 EDT 2007


On Aug 26, 8:05 pm, "Ryan Ginstrom" <softw... at ginstrom.com> wrote:
> > On Behalf Of Jason Evans
> > Parsers typically deal with tokens rather than individual
> > characters, so the scanner that creates the tokens is the
> > main thing that Unicode matters to.  I have written
> > Unicode-aware scanners for use with Parsing-based parsers,
> > with no problems.  This is pretty easy to do, since Python
> > has built-in support for Unicode strings.
>
> The only caveat being that since Chinese and Japanese scripts don't
> typically delimit "words" with spaces, I think you'd have to pass the text
> through a tokenizer (like ChaSen for Japanese) before using PyParsing.
>
> Regards,
> Ryan Ginstrom

Did you think pyparsing is so mundane as to require spaces between
tokens?  Pyparsing has been doing this type of token-recognition since
Day 1.  Looking for tokens without delimiting spaces was one of the
first applications for pyparsing.  This issue is not unique to Chinese
or Japanese text.  Pyparsing will easily find the tokens in this
string:

y=a*x**2+b*x+c

as

['y','=','a','*','x','**','2','+','b','*','x','+','c']

even though there is not a single delimiting space.  But pyparsing
will also render this as a nested parse tree, reflecting the
precedence of operations:

['y', '=', [['a', '*', ['x', '**', 2]], '+',['b', '*', 'x'], '+',
'c']]

and will allow you to access individual tokens by field name:
- lhs: y
- rhs: [['a', '*', ['x', '**', 2]], '+', ['b', '*', 'x'], '+', 'c']

Please feel free to look through the posted examples on the pyparsing
wiki at http://pyparsing.wikispaces.com/Examples, or some of the
applications currently using pyparsing at http://pyparsing.wikispaces.com/WhosUsingPyparsing,
and you might get a better feel for what kind of tasks pyparsing is
capable of.

-- Paul




More information about the Python-list mailing list