tuples, index method, Python's design

Sun Apr 15 14:59:33 EDT 2007

Roel Schroeven <rschroev_nospam_ml at fastmail.fm> writes:
> In this case s[0] is not the full Unicode scalar, but instead just the
> first part of the surrogate pair consisting of 0x1D40 (in s[0]) and
> 0x0000 (in s[1]).

Arrrrgggh.  After much head scratching I think I now understand what
you are saying.  This appears to me to be absolutely nuts.  What is
the purpose of having a unicode string type, if its sequence elements
are not guaranteed to be the unicode characters in the string?  Might
as well use byte strings for everything.

Come to think of it, I don't understand why we have this plethora of
encodings like utf-16.  utf-8 I can sort of understand on pragmatic
grounds, but aside from that I'd think UCS-4 should be used for everything,
and when a space-saving compressed representation is desired, then use
a general purpose data compression algorithm such as gzip.