How to waste computer memory?

Marko Rauhamaa marko at pacujo.net
Fri Mar 18 11:26:31 EDT 2016


Michael Torrie <torriem at gmail.com>:

> On 03/18/2016 02:26 AM, Jussi Piitulainen wrote:
>> I think Julia's way of dealing with its strings-as-UTF-8 [2] is more
>> promising. Indexing is by bytes (1-based in Julia) but the value at a
>> valid index is the whole UTF-8 character at that point, and an
>> invalid index raises an exception.
>
> This seems to me to be a leaky abstraction.

It may be that Python's Unicode abstraction is an untenable illusion
because the underlying reality is 8-bit and there's no way to hide it
completely.

There's no problem providing pure Unicode strings. Things get iffy when
Python's OS abstraction pretends sys.stdin is text or filenames are
strings.

> Julia's approach is interesting, but it strikes me as somewhat broken
> as it pretends to do O(1) indexing, but in reality it's still O(n)

If the underlying encoding is 8-bit, converting it to an O(1) structure
would still be O(n).


Marko



More information about the Python-list mailing list