unicode by default

Ian Kelly ian.g.kelly at gmail.com
Thu May 12 12:17:33 EDT 2011


On Thu, May 12, 2011 at 1:58 AM, John Machin <sjmachin at lexicon.net> wrote:
> On Thu, May 12, 2011 4:31 pm, harrismh777 wrote:
>
>>
>> So, the UTF-16 UTF-32 is INTERNAL only, for Python
>
> NO. See one of my previous messages. UTF-16 and UTF-32, like UTF-8 are
> encodings for the EXTERNAL representation of Unicode characters in byte
> streams.

Right.  *Under the hood* Python uses UCS-2 (which is not exactly the
same thing as UTF-16, by the way) to represent Unicode strings.
However, this is entirely transparent.  To the Python programmer, a
unicode string is just an abstraction of a sequence of code-points.
You don't need to think about UCS-2 at all.  The only times you need
to worry about encodings are when you're encoding unicode characters
to byte strings, or decoding bytes to unicode characters, or opening a
stream in text mode; and in those cases the only encoding that matters
is the external one.



More information about the Python-list mailing list