Python 3.2 has some deadly infection

Fri Jun 6 13:33:55 EDT 2014

On 6/6/14 1:11 PM, Marko Rauhamaa wrote:
> Steven D'Aprano <steve+comp.lang.python at pearwood.info>:
>
>> On Fri, 06 Jun 2014 18:32:39 +0300, Marko Rauhamaa wrote:
>>> Unicode, like ASCII, is a code. Representing text in unicode is
>>> encoding.
>>
>> A Unicode string as an abstract data type has no encoding.
>
> Unicode itself is an encoding. See it in action here:
>
>      72 101 108 108 111 44 32 119 111 114 108 100
>
>> It is a Platonic ideal, a pure form like the real numbers.
>
> Far from it. It is a mapping from symbols to integers. The symbols are
> the Platonic ones.
>
> The Unicode/ASCII encoding above represents the same "Platonic" string
> as this ESCDIC one:
>
>      212 133 147 147 150 107 64 166 150 153 137 132
>
>> Unicode string like this:
>>
>> s = u"NOBODY expects the Spanish Inquisition!"
>>
>> should not be thought of as a bunch of bytes in some encoding,
>
> Encoding is not tied to bytes or even computers. People can speak in
> code, after all.
>
>

Marko, you are right about the broader English meaning of the word 
"encoding".  The original point here was that "Unicode text" provides no 
information about what sequence of bytes is at work.

In the Unicode ecosystem, an encoding is a specification of how the text 
will be represented in a byte stream.  Saying something is "Unicode" 
doesn't provide that information.  You have to say, "UTF8" or "UTF16" or 
"UCS2", etc, in order to know how bytes will be involved.

When Ethan said, "a Unicode string, as a data type, has no encoding," he 
meant (as he explained) that a Unicode string doesn't require or imply 
any particular mapping to bytes.

I'm sure you understand this, I'm just trying to clarify the different 
meanings of the word "encoding".

> Marko
>

-- 
Ned Batchelder, http://nedbatchelder.com