Multibyte Character Surport for Python

Stephen J. Turnbull stephen at xemacs.org
Thu May 9 06:05:37 EDT 2002


>>>>> "Martin" == Martin v Loewis <martin at v.loewis.de> writes:

    Martin> It would break introspective tools who suddenly find
    Martin> Unicode objects in attribute dictionaries.

What Unicode objects?  They find ordinary strings that are mandated to
be encoded in UTF-8.  The tools only need to be 8-bit clean, and not
do anything that involves the assumption that #characters == #octets.
And _that_ only affects people using non-ASCII identifiers, which
might be OK since it is an extension.

We do the migration to Unicode objects later, at the same time that
you would have done it anyway.  In the meantime, this fits right in
with the kind of "backwards compatibility" that PEP 263 is all about.

Except that people who want to look at non-ASCII identifiers must
work in UTF-8 environments and not their local encoding.  But that is
OK because non-ASCII identifiers are an extension.

-- 
Institute of Policy and Planning Sciences     http://turnbull.sk.tsukuba.ac.jp
University of Tsukuba                    Tennodai 1-1-1 Tsukuba 305-8573 JAPAN
 My nostalgia for Icon makes me forget about any of the bad things.  I don't
have much nostalgia for Perl, so its faults I remember.  Scott Gilbert c.l.py



More information about the Python-list mailing list