How do I display unicode value stored in a string variable using ord()

Steven D'Aprano steve+comp.lang.python at pearwood.info
Sat Aug 18 13:59:18 EDT 2012


On Sat, 18 Aug 2012 08:07:05 -0700, wxjmfauth wrote:

> Le samedi 18 août 2012 14:27:23 UTC+2, Steven D'Aprano a écrit :
>> [...]
>> The problem with UCS-4 is that every character requires four bytes.
>> [...]
> 
> I'm aware of this (and all the blah blah blah you are explaining). This
> always the same song. Memory.

Exactly. The reason it is always the same song is because it is an 
important song.

 
> Let me ask. Is Python an 'american" product for us-users or is it a tool
> for everybody [*]?

It is a product for everyone, which is exactly why PEP 393 is so 
important. PEP 393 means that users who have only a few non-BMP 
characters don't have to pay the cost of UCS-4 for every single string in 
their application, only for the ones that actually require it. PEP 393 
means that using Unicode strings is now cheaper for everybody.

You seem to be arguing that the way forward is not to make Unicode 
cheaper for everyone, but to make ASCII strings more expensive so that 
everyone suffers equally. I reject that idea.


> Is there any reason why non ascii users are somehow penalized compared
> to ascii users?

Of course there is a reason.

If you want to represent 1114111 different characters in a string, as 
Unicode supports, you can't use a single byte per character, or even two 
bytes. That is a fact of basic mathematics. Supporting 1114111 characters 
must be more expensive than supporting 128 of them.

But why should you carry the cost of 4-bytes per character just because 
someday you *might* need a non-BMP character?



> This flexible string representation is a regression (ascii users or
> not).

No it is not. It is a great step forward to more efficient Unicode.

And it means that now Python can correctly deal with non-BMP characters 
without the nonsense of UTF-16 surrogates:

steve at runes:~$ python3.3 -c "print(len(chr(1114000)))"  # Right!
1
steve at runes:~$ python3.2 -c "print(len(chr(1114000)))"  # Wrong!
2

without doubling the storage of every string.

This is an important step towards making the full range of Unicode 
available more widely.

 
> I recognize in practice the real impact is for many users closed to zero

Then what's the problem?


> (including me) but I have shown (I think) that this flexible
> representation is, by design, not as optimal as it is supposed to be.

You have not shown any real problem at all. 

You have shown untrustworthy, edited timing results that don't match what 
other people are reporting.

Even if your timing results are genuine, you haven't shown that they make 
any difference for real code that does useful work.



-- 
Steven



More information about the Python-list mailing list