[Python-Dev] Re: Unicode debate

Just van Rossum just@letterror.com
Fri, 28 Apr 2000 18:51:03 +0100


[GvR, on string.encoding ]
>Marc-Andre took this idea a bit further, but I think it's not
>practical given the current implementation: there are too many places
>where the C code would have to be changed in order to propagate the
>string encoding information,

I may miss something, but the encoding attr just travels with the string
object, no? Like I said in my reply to MAL, I think it's undesirable to do
*anything* with the encoding attr if not in combination with a unicode
string.

>and there are too many sources of strings
>with unknown encodings to make it very useful.

That's why the default encoding must be settable as well, as Fredrik suggested.

>Plus, it would slow down 8-bit string ops.

Not if you ignore it most of the time, and just pass it along when
concatenating.

>I have a better idea: rather than carrying around 8-bit strings with
>an encoding, use Unicode literals in your source code.

Explain that to newbies... I guess is that they will want simple 8 bit
strings in their native encoding. Dunno.

>If the source
>encoding is known, these will be converted using the appropriate
>codec.
>
>If you object to having to write u"..." all the time, we could say
>that "..." is a Unicode literal if it contains any characters with the
>top bit on (of course the source file encoding would be used just like
>for u"...").

Only if "\377" would still yield an 8-bit string, for binary goop...

Just