[Python-Dev] Internal representation of strings and Micropython

Wed Jun 4 18:52:17 CEST 2014

On 2014-06-04 14:33, Nick Coghlan wrote:
> On 4 June 2014 15:39,  <dw+python-dev at hmmz.org> wrote:
>> On Wed, Jun 04, 2014 at 03:17:00PM +1000, Nick Coghlan wrote:
>>
>>> There's a general expectation that indexing will be O(1) because
>>> all the builtin containers that support that syntax use it for
>>> O(1) lookup operations.
>>
>> Depending on your definition of built in, there is at least one
>> standard library container that does not - collections.deque.
>>
>> Given the specialized kinds of application this Python
>> implementation is targetted at, it seems UTF-8 is ideal considering
>> the huge memory savings resulting from the compressed
>> representation, and the reduced likelihood of there being any real
>> need for serious text processing on the device.
>
> Right - I wasn't clear that I think storing text internally as UTF-8
> sounds fine for MicroPython. Anything where the O(N) nature of
> indexing by code point matters probably won't be run in that
> environment anyway.
>
In order to avoid indexing, you could use some kind of 'cursor' class to
step forwards and backwards along strings. The cursor could include
both the codepoint index and the byte index.