Oh look, another language (ceylon)
Steven D'Aprano
steve at pearwood.info
Mon Nov 18 21:13:17 EST 2013
On Tue, 19 Nov 2013 10:25:00 +1100, Chris Angelico wrote:
> But the problem is also with strings coming back from JS.
Just because you call it a "string" in Ceylon, doesn't mean you have to
use the native Javascript string type unchanged.
Since the Ceylon compiler controls what Javascript operations get called
(the user never writes any Javascript directly), the compiler can tell
which operations potentially add surrogates. Since strings are immutable
in Ceylon, a slice of a BMP-only string is also BMP-only; concatenating
two BMP-only strings gives a BMP-only string. I expect that uppercasing
or lowercasing such strings will also keep the same invariant, but if
not, well, you already have to walk the string to convert it, walking it
again should be no more expensive.
The point is not that my off-the-top-of-my-head pseudo-implementation was
optimal in all details, but that *text strings* should be decent data
structures with smarts, not dumb arrays of variable-width characters. If
that means avoiding dumb-array-of-char naive implementations, and writing
your own, that's part of the compiler writers job.
Python strings can include null bytes, unlike C, even when built on top
of C. They know their length, unlike C, even when built on top of C. Just
because the native Java and Javascript string types doesn't do these
things, doesn't mean that they can't be done in Javascript.
> - as opposed to simply saying "string
> indexing can be slow on large strings", which puts the cost against a
> visible line of code.
For all we know, Ceylon already does something like this, but merely
doesn't advertise the fact that while it *can* be slow, it can *also* be
fast. It's an implementation detail, perhaps, much like string
concatenation in Python officially requires building a new string, but in
CPython sometimes it can append to the original string.
Still, given that Pike and Python have already solved this problem, and
have O(1) string indexing operations and length for any Unicode string,
SMP and BMP, it is a major disappointment that Ceylon doesn't.
--
Steven
More information about the Python-list
mailing list