[Python-Dev] Allowing u.encode() to return non-strings

Tim Peters tim.peters at gmail.com
Mon Jun 28 20:53:32 EDT 2004


[Bill Janssen]
> Tim, do I understand then that Unicode strings have an implicit
> character encoding, but non-Unicode strings do not?

An 8-bit string is a sequence of 8-bit bytes.  If those bytes are to
"mean something", you have to supply the meaning, or use them in a
context that supplies a specific meaning for you.  This seems nearly
impossible for an American to understand, but non-Americans appear to
know it at birth (if not earlier).

A Unicode string is, at least in theory, a sequence of Unicode
characters, the latter defined in excruciating detail by the Unicode
Consortium.  There's no conventional sense in which a Unicode string
is an encoding of something other than exactly itself, but you could
certainly make one up.



More information about the Python-Dev mailing list