[Python-Dev] Encoding of 8-bit strings and Python source code

Guido van Rossum guido@python.org
Tue, 25 Apr 2000 18:35:30 -0400


[Fredrik]
> -- my proposal: expose both types, but let them contain characters
>    from the same character set -- at least when used as strings.
> 
>    as before, 8-bit strings can be used to store binary data, so we
>    don't need a separate ByteArray type.  in an 8-bit string, there's
>    always one character per byte.
> 
> [imho: small changes to the existing code base, about as efficient as
> can be, no attempt to second-guess the user, fully backwards com-
> patible, fully compliant with the definition of strings in the language
> reference, patches are available, etc...]

Sorry, all this proposal does is change the default encoding on
conversions from UTF-8 to Latin-1.  That's very
western-culture-centric.

You already have control over the encoding: use unicode(s,
"latin-1").  If there are places where you don't have enough control
(e.g. file I/O), let's add control there.

--Guido van Rossum (home page: http://www.python.org/~guido/)