tuples, index method, Python's design

Sun Apr 15 03:55:09 EDT 2007

"Rhamphoryncus" <rhamph at gmail.com> writes:
> Indexing cost, memory efficiency, and canonical representation: pick
> two.  You can't use a canonical representation (scalar values) without
> some sort of costly search when indexing (O(log n) probably) or by
> expanding to the worst-case size (UTF-32).  Python has taken the
> approach of always providing efficient indexing (O(1)), but you can
> compile it with either UTF-16 (better memory efficiency) or UTF-32
> (canonical representation).

I still don't get it.  UTF-16 is just a data compression scheme, right?
I mean, s[17] isn't the 17th character of the (unicode) string regardless
of which memory byte it happens to live at?  It could be that that accessing
it takes more than constant time, but that's hidden by the implementation.

So where does the invariant c==s[s.index(c)] fail, assuming s contains c?