Oh look, another language (ceylon)

Steven D'Aprano steve at pearwood.info
Mon Nov 18 21:13:17 EST 2013


On Tue, 19 Nov 2013 10:25:00 +1100, Chris Angelico wrote:

> But the problem is also with strings coming back from JS. 

Just because you call it a "string" in Ceylon, doesn't mean you have to 
use the native Javascript string type unchanged.

Since the Ceylon compiler controls what Javascript operations get called 
(the user never writes any Javascript directly), the compiler can tell 
which operations potentially add surrogates. Since strings are immutable 
in Ceylon, a slice of a BMP-only string is also BMP-only; concatenating 
two BMP-only strings gives a BMP-only string. I expect that uppercasing 
or lowercasing such strings will also keep the same invariant, but if 
not, well, you already have to walk the string to convert it, walking it 
again should be no more expensive.

The point is not that my off-the-top-of-my-head pseudo-implementation was 
optimal in all details, but that *text strings* should be decent data 
structures with smarts, not dumb arrays of variable-width characters. If 
that means avoiding dumb-array-of-char naive implementations, and writing 
your own, that's part of the compiler writers job.

Python strings can include null bytes, unlike C, even when built on top 
of C. They know their length, unlike C, even when built on top of C. Just 
because the native Java and Javascript string types doesn't do these 
things, doesn't mean that they can't be done in Javascript.


> - as opposed to simply saying "string
> indexing can be slow on large strings", which puts the cost against a
> visible line of code.

For all we know, Ceylon already does something like this, but merely 
doesn't advertise the fact that while it *can* be slow, it can *also* be 
fast. It's an implementation detail, perhaps, much like string 
concatenation in Python officially requires building a new string, but in 
CPython sometimes it can append to the original string.


Still, given that Pike and Python have already solved this problem, and 
have O(1) string indexing operations and length for any Unicode string, 
SMP and BMP, it is a major disappointment that Ceylon doesn't.



-- 
Steven



More information about the Python-list mailing list