RE Module Performance

Michael Torrie torriem at gmail.com
Wed Jul 24 18:59:15 EDT 2013


On 07/24/2013 04:19 PM, Chris Angelico wrote:
> I'm referring here to objections like jmf's, and also to threads like this:
> 
> http://mozilla.6506.n7.nabble.com/Flexible-String-Representation-full-Unicode-for-ES6-td267585.html
> 
> According to the ECMAScript people, UTF-16 and exposing surrogates to
> the application is a critical feature to be maintained. I disagree.
> But it's not my language, so I'm stuck with it. (I ended up writing a
> little wrapper function in C that detects unpaired surrogates, but
> that still doesn't deal with the possibility that character indexing
> can create a new character that was never there to start with.)

This is starting to drift off topic here now, but after reading your
comments on that post, and others objections, I don't fully understand
why making strings simply "unicode" in javascript breaks compatibility
with older scripts.  What operations are performed on strings that
making unicode an abstract type would break?  Is it just in the input
and output of text that must be decoded and encode?  Why should a script
care about the internal representation of unicode strings?  Is it
because the incorrect behavior of UTF-16 and the exposed surrogates (and
subsequent incorrect indexing) are actually depended on by some scripts?



More information about the Python-list mailing list