Python 3.2 has some deadly infection

Steven D'Aprano steve+comp.lang.python at pearwood.info
Fri Jun 6 11:57:51 EDT 2014


On Fri, 06 Jun 2014 18:32:39 +0300, Marko Rauhamaa wrote:

> Michael Torrie <torriem at gmail.com>:
> 
>> On 06/06/2014 08:10 AM, Marko Rauhamaa wrote:
>>> Ethan Furman <ethan at stoneleaf.us>:
>>>> ASCII is *not* the state of "this string has no encoding" -- that
>>>> would be Unicode; a Unicode string, as a data type, has no encoding.
>>> 
>>> Huh?
>>
>> [...]
>>
>> What part of his statement are you saying "Huh?" about?
> 
> Unicode, like ASCII, is a code. Representing text in unicode is
> encoding.

A Unicode string as an abstract data type has no encoding. It is a 
Platonic ideal, a pure form like the real numbers. There are no bytes, no 
bits, just code points. That is what Ethan means. A Unicode string like 
this:

s = u"NOBODY expects the Spanish Inquisition!"

should not be thought of as a bunch of bytes in some encoding, but as an 
array of code points. Eventually the abstraction will leak, all 
abstractions do, but not for a very long time.


-- 
Steven D'Aprano
http://import-that.dreamwidth.org/



More information about the Python-list mailing list