Language design

Benjamin Kaplan benjamin.kaplan at case.edu
Wed Sep 11 20:54:33 EDT 2013


On Wed, Sep 11, 2013 at 5:37 PM, Mark Janssen <dreamingforward at gmail.com> wrote:
>> Unicode is not 16-bit any more than ASCII is 8-bit. And you used the
>> word "encod[e]", which is the standard way to turn Unicode into bytes
>> anyway. No, a Unicode string is a series of codepoints - it's most
>> similar to a list of ints than to a stream of bytes.
>
> Okay, now you're in blah, blah land.
>
> --mark
> --

There's no such thing as 16-bit Unicode. Unicode is a sequence of
characters, not a sequence of bytes. It's an abstract thing. To work
with it on a computer, you need to use a byte encoding because
computers don't deal with with abstract things. UTF-16 is one encoding
method that can map any character defined in Unicode to a sequence of
bytes. UTF-16 isn't Unicode, it's just a function that maps a byte
string to a character string. Python's unicode class is a character
string- as far as the user is concerned, it's made up of those
abstract "character" things and not bytes at all.



More information about the Python-list mailing list