Python Unicode handling wins again -- mostly

Mark Lawrence breamoreboy at yahoo.co.uk
Fri Nov 29 20:07:29 EST 2013


On 30/11/2013 00:44, Steven D'Aprano wrote:
>
> (5) What is the length of "😸😾"?
>
> Both characters U+1F636 (GRINNING CAT FACE WITH SMILING EYES) and U+1F63E
> (POUTING CAT FACE) are outside the Basic Multilingual Plane, which means
> they require more than two bytes each. Most programming languages using
> UTF-16 encodings internally (including Javascript and Java) fail this
> test. Python 3.3 passes:
>
> py> s = '😸😾'
> py> len(s)
> 2
>

I couldn't care less if it passes, it's too slow and uses too much 
memory[1], so please get the completely bug ridden Python 2 unicode 
implementation restored at the earliest possible opportunity :)

[1]because I say so although I don't actually have any evidence to 
support my case. :) :)

-- 
Python is the second best programming language in the world.
But the best has yet to be invented.  Christian Tismer

Mark Lawrence




More information about the Python-list mailing list