[Python-Dev] UTF-8 is no fun...

Fredrik Lundh fredrik@pythonware.com
Wed, 12 Apr 2000 11:39:03 +0200


Andy Robinson <andy@reportlab.com> wrote:
> I've spent a fair bit of time converting strings and files the=20
> last few days, and I'd add that what we have now seems both rock solid
> and very easy to use. =20

I'm not worried about the core string types or the conversion
machinery; what disturbs me is mostly the use of automagic
conversions to UTF-8, which breaks the fundamental assumption
that a string is a sequence of len(string) characters.

    "The items of a string are characters. There is no
    separate character type; a character is represented
    by a string of one item"

    (from the language reference)

I still think the "all strings are sequences of unicode characters"
strawman I posted earlier would simplify things for everyone in-
volved (programmers, users, and the interpreter itself).

more on this later.  gotta ship some code first.

</F>