[Python-Dev] New Py_UNICODE doc

"Martin v. Löwis" martin at v.loewis.de
Tue May 10 20:51:16 CEST 2005


M.-A. Lemburg wrote:
> If all you're interested in is the lexical class of the code points
> in a string, you could use such a codec to map each code point
> to a code point representing the lexical class.

How can I efficiently implement such a codec? The whole point is doing
that in pure Python (because if I had to write an extension module,
I could just as well do the entire lexical analysis in C, without
any regular expressions).

Any kind of associative/indexed table for this task consumes a lot
of memory, and takes quite some time to initialize.

Regards,
Martint


More information about the Python-Dev mailing list