Ah Python, you have spoiled me for all other languages
Steven D'Aprano
steve at pearwood.info
Sat May 23 09:01:14 EDT 2015
On Sat, 23 May 2015 10:33 pm, Thomas 'PointedEars' Lahn wrote:
> If only characters were represented as sequences UTF-16 code units in
> ECMAScript implementations like JavaScript, there would not be a problem
> beyond the BMP;
Are you being sarcastic?
This is Rhino:
js> var c = String.fromCharCode(65535); // in the BMP
js> print(c.charCodeAt(0));
65535
So far so good.
js> var c = String.fromCharCode(65536); // astral character
js> print(c.charCodeAt(0));
0
Can you name any ECMAScript implementation which correctly handles code
points in the supplementary multilingual planes?
By the way, for many years Python implemented Unicode as UTF-16 code units,
the so-called "narrow build":
py> c = unichr(65536)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: unichr() arg not in range(0x10000) (narrow Python build)
Let's try again:
py> c = u'\U00010000' # a single code point
py> len(c)
2
I'm not saying that it is impossible to have a correct Unicode implemention
using UTF-16, but I've never seen one.
--
Steven
More information about the Python-list
mailing list