Parser Generator?

Ryan Ginstrom software at ginstrom.com
Sun Aug 26 21:05:50 EDT 2007


> On Behalf Of Jason Evans
> Parsers typically deal with tokens rather than individual 
> characters, so the scanner that creates the tokens is the 
> main thing that Unicode matters to.  I have written 
> Unicode-aware scanners for use with Parsing-based parsers, 
> with no problems.  This is pretty easy to do, since Python 
> has built-in support for Unicode strings.

The only caveat being that since Chinese and Japanese scripts don't
typically delimit "words" with spaces, I think you'd have to pass the text
through a tokenizer (like ChaSen for Japanese) before using PyParsing.

Regards,
Ryan Ginstrom




More information about the Python-list mailing list