[Python-Dev] Internal representation of strings and Micropython

Fri Jun 6 17:59:31 CEST 2014

On 6/6/2014 4:53 AM, Hrvoje Niksic wrote:
> On 06/04/2014 05:52 PM, Mark Lawrence wrote:

>> Out of idle curiosity is there anything that stops MicroPython, or any
>> other implementation for that matter, from providing views of a string
>> rather than copying every time?  IIRC memoryviews in CPython rely on the
>> buffer protocol at the C API level, so since strings don't support this
>> protocol you can't take a memoryview of them.  Could this actually be
>> implemented in the future, is the underlying C code just too
>> complicated, or what?
>>
>
> Memory view of Unicode strings is controversial for two reasons:
>
> 1. It exposes the internal representation of the string. If memoryviews
> of strings were supported in Python 3, PEP 393 would not have been
> possible (without breaking that feature).
>
> 2. Even if it were OK to expose the internal representation, it might
> not be what the users expect. For example, memoryview("Hrvoje") would
> return a view of a 6-byte buffer, while memoryview("Nikšić") would
> return a view of a 12-byte UCS-2 buffer. The user of a memory view might
> expect to get UCS-2 (or UCS-4, or even UTF-8) in all cases.
>
> An implementation that decided to export strings as memory views might
> be forced to make a decision about internal representation of strings,
> and then stick to it.
>
> The byte objects don't have these issues, which is why in Python 2.7
> memoryview("foo") works just fine, as does memoryview(b"foo") in Python 3.

The other problem is that a small slice view of a large object keeps the 
large object alive, so a view user needs to think carefully about 
whether to make a copy or create a view, and later to copy views to 
delete the base object. This is not for beginners.

-- 
Terry Jan Reedy