[Python-Dev] Unicode source code

Just van Rossum just@letterror.com
Sun, 9 Feb 2003 18:22:45 +0100


M.-A. Lemburg wrote:

> Now, to accept Unicode it would probably be worthwhile hooking
> into this chain at step 2 rather than step 1 (the code for the
> tokenizer is in Parser/tokenizer.c, the compiler code in
> Python/compiler.c), however, this is difficult because most
> APIs for compiling code are built on char* buffers.
>
> A short-term solution would probably be to convert Unicode to
> UTF-8 and prepend a UTF-8 BOM mark so that the tokenizer
> knows that it is getting UTF-8. Haven't tested this though.

Hm. What I'm looking into now is to simply define a PyCompilerFlags flag
called PyCF_SOURCE_IS_UTF8. eval() and compile() will then convert a
unicode string to utf-8 and set this flag. This seems a very low-impact
solution. Does this make sense?

Just