Parser Generator?
Ryan Ginstrom
software at ginstrom.com
Sun Aug 26 21:05:50 EDT 2007
> On Behalf Of Jason Evans
> Parsers typically deal with tokens rather than individual
> characters, so the scanner that creates the tokens is the
> main thing that Unicode matters to. I have written
> Unicode-aware scanners for use with Parsing-based parsers,
> with no problems. This is pretty easy to do, since Python
> has built-in support for Unicode strings.
The only caveat being that since Chinese and Japanese scripts don't
typically delimit "words" with spaces, I think you'd have to pass the text
through a tokenizer (like ChaSen for Japanese) before using PyParsing.
Regards,
Ryan Ginstrom
More information about the Python-list
mailing list