[Python-Dev] Internal representation of strings and Micropython

Wed Jun 4 03:17:18 CEST 2014

There is a discussion over at MicroPython about the internal 
representation of Unicode strings. Micropython is aimed at embedded 
devices, and so minimizing memory use is important, possibly even 
more important than performance.

(I'm not speaking on their behalf, just commenting as an interested 
outsider.)

At the moment, their Unicode support is patchy. They are talking about 
either:

* Having a build-time option to restrict all strings to ASCII-only.

  (I think what they mean by that is that strings will be like Python 2 
  strings, ASCII-plus-arbitrary-bytes, not actually ASCII.)

* Implementing Unicode internally as UTF-8, and giving up O(1) 
  indexing operations.

https://github.com/micropython/micropython/issues/657

Would either of these trade-offs be acceptable while still claiming 
"Python 3.4 compatibility"?

My own feeling is that O(1) string indexing operations are a quality of 
implementation issue, not a deal breaker to call it a Python. I can't 
see any requirement in the docs that str[n] must take O(1) time, but 
perhaps I have missed something.

-- 
Steven