Flexible string representation, unicode, typography, ...

Sun Sep 2 16:07:35 EDT 2012

On 09/02/2012 03:45 PM, Michael Torrie wrote:
> <jmfauth snipped>:
> In the worst case, Python's strings are as slow as Go because Python
> does the exact same thing as Go, but chooses between three encodings
> instead of just one. Best case scenario, Python's strings could be
> much faster than Go's because indexing through 2 of the 3 encodings is
> O(1) because they are constant-width encodings. If as you say, the
> latin-1 subset of UTF-8 is used, then UTF-8 indexing is O(1) too,
> otherwise it's probably O(n). 

I'm afraid you have it backwards.  the Utf-8 version of the
latin-1-compatible characters would be variable length.  But my
understanding of the pep is that the internal one-byte format is simply
the lowest order byte of each code point, after assuring that all code
points in the particular string are less than 256.  That's going to
coincidentally resemble latin-1's encoding, but since it's an internal
form, the resemblance is irrelevant.  Anyway, those one-byte values are
going to be O(1), naturally.

No encoding involved, and no searching nor expanding.

-- 

DaveA