[Python-Dev] Internal representation of strings and Micropython

Thu Jun 5 09:54:11 CEST 2014

Paul Sokolovsky writes:

 > Please put that in perspective when alarming over O(1) indexing of
 > inherently problematic niche datatype. (Again, it's not my or
 > MicroPython's fault that it was forced as standard string type. Maybe
 > if CPython seriously considered now-standard UTF-8 encoding, results
 > of what is "str" type might be different. But CPython has gigabytes of
 > heap to spare, and for MicroPython, every half-bit is precious).

Would you please stop trolling?  The reasons for adopting Unicode as a
separate data type were good and sufficient in 2000, and they remain
so today, even if you have been fortunate enough not to burn yourself
on character-byte conflation yet.

What matters to you is that str (unicode) is an opaque type -- there
is no specification of the internal representation in the language
reference, and in fact several different ones coexist happily across
existing Python implementations -- and you're free to use a UTF-8
implementation if that suits the applications you expect for
MicroPython.

PEP 393 exists, of course, and specifies the current internal
representation for CPython 3.  But I don't see anything in it that
suggests it's mandated for any other implementation.