How do I display unicode value stored in a string variable using ord()

Dave Angel d at davea.name
Fri Aug 17 23:30:22 EDT 2012


On 08/17/2012 08:21 PM, Ian Kelly wrote:
> On Aug 17, 2012 2:58 PM, "Dave Angel" <d at davea.name> wrote:
>> The internal coding described in PEP 393 has nothing to do with latin-1
>> encoding.
> It certainly does. PEP 393 provides for Unicode strings to be represented
> internally as any of Latin-1, UCS-2, or UCS-4, whichever is smallest and
> sufficient to contain the data. I understand the complaint to be that while
> the change is great for strings that happen to fit in Latin-1, it is less
> efficient than previous versions for strings that do not.

That's not the way I interpreted the PEP 393.  It takes a pure unicode
string, finds the largest code point in that string, and chooses 1, 2 or
4 bytes for every character, based on how many bits it'd take for that
largest code point.   Further i read it to mean that only 00 bytes would
be dropped in the process, no other bytes would be changed.   I take it
as a coincidence that it happens to match latin-1;  that's the way
Unicode happened historically, and is not Python's fault.  Am I reading
it wrong?

I also figure this is going to be more space efficient than Python 3.2
for any string which had a max code point of 65535 or less (in Windows),
or 4billion or less (in real systems).  So unless French has code points
over 64k, I can't figure that anything is lost.

I have no idea about the times involved, so i wanted a more specific
complaint.

> I don't know how much merit there is to this claim. It would seem to me
> that even in non-western locales, most strings are likely to be Latin-1 or
> even ASCII, e.g.  class and attribute and function names.
>
>

The jmfauth rant I was responding to was saying that French isn't
efficiently encoded, and that performance of some vague operations were
somehow reduced by several fold.  I was just trying to get him to be
more specific.



-- 

DaveA




More information about the Python-list mailing list