Ah Python, you have spoiled me for all other languages

Thomas 'PointedEars' Lahn PointedEars at web.de
Sun Jun 7 04:21:15 EDT 2015


Ned Batchelder wrote:

> On Saturday, May 23, 2015 at 9:01:29 AM UTC-4, Steven D'Aprano wrote:
>> On Sat, 23 May 2015 10:33 pm, Thomas 'PointedEars' Lahn wrote:
>> > If only characters were represented as sequences UTF-16 code units in
>> > ECMAScript implementations like JavaScript, there would not be a
>> > problem beyond the BMP;
>> 
>> Are you being sarcastic?
> 
> IIUC, Thomas' point is that *characters* should be sequences of
> codepoints, not that *strings* should be.

No, my point is that one character should be a sequence of code _units_ (for 
a code point value).  But in ECMAScript implementations (so far), a *code 
point value* equals a character, and that is a problem in ECMAScript because 
there the value range is limited to what can be encoded in 16 bit.  The 
problem starts beyond the BMP where 16 bit are no longer sufficient for a 
code sequence and code point value, and code sequence and code point value 
are no longer equal.

-- 
PointedEars

Twitter: @PointedEars2
Please do not cc me. / Bitte keine Kopien per E-Mail.



More information about the Python-list mailing list