UTF-8 in source code (Re: [Python-Dev] Internationalization Toolkit)
Tim Peters
tim_one@email.msn.com
Fri, 12 Nov 1999 00:18:09 -0500
[MAL]
> ...
> The conversion goes as follows:
> · for single characters (and this includes all \XXX sequences
> except \uXXXX), take the ordinal and interpret it as Unicode
> ordinal for \uXXXX sequences, insert the Unicode character
> with ordinal 0xXXXX instead
Perfect!
[about "raw" Unicode strings]
> ...
> Not sure whether we really need to make this even more complicated...
> The \uXXXX strings look ugly, adding a few \\\\ for e.g. REs or
> filenames won't hurt much in the context of those \uXXXX monsters :-)
Alas, this won't stand over the long term. Eventually people will write
Python using nothing but Unicode strings -- "regular strings" will
eventurally become a backward compatibility headache <0.7 wink>. IOW,
Unicode regexps and Unicode docstrings and Unicode formatting ops ...
nothing will escape. Nor should it.
I don't think it all needs to be done at once, though -- existing languages
usually take years to graft in gimmicks to cover all the fine points. So,
happy to let raw Unicode strings pass for now, as a relatively minor point,
but without agreeing it can be ignored forever.
> ...
> BTW, if you want to type in UTF-8 strings and have them converted
> to Unicode, you can use the standard:
>
> u = unicode('...string with UTF-8 encoded characters...','utf-8')
That's what I figured, and thanks for the confirmation.