How do I display unicode value stored in a string variable using ord()

Ian Kelly ian.g.kelly at gmail.com
Sat Aug 18 11:18:39 EDT 2012


(Resending this to the list because I previously sent it only to
Steven by mistake.  Also showing off a case where top-posting is
reasonable, since this bit requires no context. :-)

On Sat, Aug 18, 2012 at 1:41 AM, Ian Kelly <ian.g.kelly at gmail.com> wrote:
>
> On Aug 17, 2012 10:17 PM, "Steven D'Aprano"
> <steve+comp.lang.python at pearwood.info> wrote:
>>
>> Unicode strings are not represented as Latin-1 internally. Latin-1 is a
>> byte encoding, not a unicode internal format. Perhaps you mean to say
>> that they are represented as a single byte format?
>
> They are represented as a single-byte format that happens to be equivalent
> to Latin-1, because Latin-1 is a proper subset of Unicode; every character
> representable in Latin-1 has a byte value equal to its Unicode codepoint.
> This talk of whether it's a byte encoding or a 1-byte Unicode representation
> is then just semantics. Even the PEP refers to the 1-byte representation as
> Latin-1.
>
>>
>> >> I understand the complaint
>> >> to be that while the change is great for strings that happen to fit in
>> >> Latin-1, it is less efficient than previous versions for strings that
>> >> do not.
>> >
>> > That's not the way I interpreted the PEP 393.  It takes a pure unicode
>> > string, finds the largest code point in that string, and chooses 1, 2 or
>> > 4 bytes for every character, based on how many bits it'd take for that
>> > largest code point.
>>
>> That's how I interpret it too.
>
> I don't see how this is any different from what I described. Using all 4
> bytes of the code point, you get UCS-4. Truncating to 2 bytes, you get
> UCS-2. Truncating to 1 byte, you get Latin-1.



More information about the Python-list mailing list