[Python-Dev] Internal representation of strings and Micropython

Stephen J. Turnbull stephen at xemacs.org
Thu Jun 5 12:00:01 CEST 2014


Serhiy Storchaka writes:

 > Yes, I remember. I thing that hybrid FSR-UTF16 (like FSR, but UTF-16 is 
 > used instead of UCS4) is the better choice for CPython. I suppose that 
 > with populating emoticons and other icon characters in nearest 5 or 10 
 > years, even English text will often contain astral characters. And 
 > spending 4 bytes per character if long text contains one astral 
 > character looks too prodigally.

Why use something that complex if you don't have to?  For the use case
you have in mind, just map them into private space.  If you really
want to be aggressive, use surrogate space, too (anything that cares
what a scalar represents should be trapping on non-scalars, catch that
exception and look up the char -- dangerous, though, because such
exceptions are probably all over the place).





More information about the Python-Dev mailing list