[Python-Dev] Internal representation of strings and Micropython

Terry Reedy tjreedy at udel.edu
Thu Jun 5 04:25:03 CEST 2014


On 6/4/2014 6:54 PM, Serhiy Storchaka wrote:
> 05.06.14 00:21, Terry Reedy написав(ла):
>> On 6/4/2014 3:41 AM, Jeff Allen wrote:
>>> Jython uses UTF-16 internally -- probably the only sensible choice in a
>>> Python that can call Java. Indexing is O(N), fundamentally. By
>>> "fundamentally", I mean for those strings that have not yet noticed that
>>> they contain no supplementary (>0xffff) characters.
>>
>> Indexing can be made O(log(k)) where k is the number of astral chars,
>> and is usually small.
>
> I like your idea and think it would be great if Jython will implement
> it.

A proof of concept implementation in Python that handles both indexing 
and slicing is on the tracker. It is simpler than I initially expected.

 > Unfortunately it is too late to do this in CPython.

I mentioned it as an alternative during the '393 discussion. I more than 
half agree that the FSR is the better choice for CPython, which had no 
particular attachment to UTF-16 in the way that I think Jython, for 
instance, does.

-- 
Terry Jan Reedy





More information about the Python-Dev mailing list