uPy Unicode [was Re: Instead of deciding between Python or Lisp blah blah blah]

Wed May 13 01:13:32 EDT 2015

On Wed, May 13, 2015 at 11:23 AM, Steven D'Aprano
<steve+comp.lang.python at pearwood.info> wrote:
> On Wed, 13 May 2015 03:26 am, Chris Angelico wrote:
>
>> back when MicroPython was debating the implementation of Unicode
>> strings, there was a lengthy discussion on python-dev about whether
>> it's okay for string subscripting to be O(n) instead of O(1), and the
>> final decision was that yes, that's an implementation detail. (UTF-8
>> internal string representation, so iterating over a string would still
>> yield characters in overall O(n), but iterating up to the string's
>> length and subscripting for each character would become O(n*n) on
>> uPy.)
>
> o_O
>
> Got a link to that? I must have missed it.

Linking to python-dev is a bit fiddly and/or unstable due to URL
changes, plus the discussion there was pretty long and rambly.
Probably the best I can do is point you to the tracker issue where I
opened the original question:

https://github.com/micropython/micropython/issues/657

(The biggest issue was that uPy was, at the time, fundamentally
incompatible with Python's stipulated semantics - imagine all the
problems of a narrow build of CPython <3.3, only more frequent because
it's actually UTF-8.)

It was finally decided, I think, that Python-the-language didn't
actually mandate O(1) indexing, meaning that a microcontroller (on
which strings aren't going to be gigantic anyway) is welcome to use a
UTF-8 internal representation, with "Hello, world"[4] required to scan
across and count non-continuation bytes to find the right character.
Whether or not uPy actually ended up accepting the requirements of
proper Unicode support I don't know, as I'm no longer involved with
the project.

ChrisA