How do I display unicode value stored in a string variable using ord()

Chris Angelico rosuav at gmail.com
Sun Aug 19 18:42:53 EDT 2012


On Mon, Aug 20, 2012 at 3:34 AM, Terry Reedy <tjreedy at udel.edu> wrote:
> On 8/19/2012 4:04 AM, Paul Rubin wrote:
>> I realize the folks who designed and implemented PEP 393 are very smart
>> cookies and considered stuff carefully, while I'm just an internet user
>> posting an immediate impression of something I hadn't seen before (I
>> still use Python 2.6), but I still have to ask: if the 393 approach
>> makes sense, why don't other languages do it?
>
> Python has often copied or borrowed, with adjustments. This time it is the
> first. We will see how it goes, but it has been tested for nearly a year
> already.

Maybe it wasn't consciously borrowed, but whatever innovation is done,
there's usually an obscure beardless language that did it earlier. :)

Pike has a single string type, which can use the full Unicode range.
If all codepoints are <256, the string width is 8 (measured in bits);
if <65536, width is 16; otherwise 32. Using the inbuilt count_memory
function (similar to the Python function used somewhere earlier in
this thread, but which I can't at present put my finger to), I find
that for strings of 16 bytes or more, there's a fixed 20-byte header
plus the string content, stored in the correct number of bytes. (Pike
strings, like Python ones, are immutable and do not need expansion
room.)

However, Python goes a bit further by making it VERY clear that this
is a mere optimization, and that Unicode strings and bytes strings are
completely different beasts. In Pike, it's possible to forget to
encode something before (say) writing it to a socket. Everything works
fine while you have only ASCII characters in the string, and then
breaks when you have a >255 codepoint - or perhaps worse, when you
have a 127<x<256, and the other end misinterprets it.

Really, the only viable alternative to PEP 393 is a fixed 32-bit
representation - it's the only way that's guaranteed to provide
equivalent semantics. The new storage format is guaranteed to take no
more memory than that, and provide equivalent functionality.

ChrisA



More information about the Python-list mailing list