Oh look, another language (ceylon)

Chris Angelico rosuav at gmail.com
Mon Nov 18 18:25:00 EST 2013


On Tue, Nov 19, 2013 at 1:30 AM, Steven D'Aprano
<steve+comp.lang.python at pearwood.info> wrote:
> I suppose that's not terrible, except for the O(n) string operations
> which is just dumb. Yes, it's better than buggy, broken strings. But
> still dumb, because those aren't the only choices. For example, for the
> sake of an extra two bytes at the start of each string, they could store
> a flag and a length:

True, but I suspect that _any_ variance from JS strings would have
significant impact on the performance of everything that crosses the
boundary. If anything, I'd be looking at a permanent 32-bit shim on
the string (rather than the 16-or-32-bit that you describe, or the
16-or-48-bit that Dave clarifies your theory as needing); that would
allow strings up to 2GB (31 bits of pure binary length), and exceeding
that could just raise a RuntimeError. Then, passing any string to a JS
method would simply mean trimming off the first two code units.

But the problem is also with strings coming back from JS. Every time
you get something crossing from JS to Ceylon, you have to walk it,
count up its length, and see if it has any surrogates (and somehow
deal with mismatched surrogates). Every string, even if all you're
going to do is give it straight back to JS in the next line of code.
Potentially quite expensive, and surprisingly so - as opposed to
simply saying "string indexing can be slow on large strings", which
puts the cost against a visible line of code.

ChrisA



More information about the Python-list mailing list