Multibyte Character Surport for Python

Martin v. Loewis martin at v.loewis.de
Thu May 9 04:00:07 EDT 2002


pinard at iro.umontreal.ca (François Pinard) writes:

> > For Python, existing code, like inspect.py, *will* break: if introspective
> > code is suddenly confronted with non-ASCII identifiers, it might break,
> > e.g. if Unicode objects show up as keys in __dict__.
> 
> Should I read that one may not use Unicode strings as dictionary keys?

No, that is certainly possible. Also, a byte string and a Unicode
string have the same hash value and compare equal if the byte string
is an ASCII representation of the Unicode string, so you can use them
interchangably inside a dictionary.

It's just that introspective code won't *expect* to find Unicode
objects as keys of an attribute dictionary, and will likely fail to
process it in a meaningful way.

> One would expect Python to support narrow and wide strings equally well.
> In that precise case, `inspect.py' would need to be repaired, indeed.
> A lot of things have been "repaired" when Unicode was introduced into
> Python, I see this as perfectly normal.  It is part of the move.

If only inspect.py was affected, that would be fine. However, that
also affects tools from other people, like PythonWin, which "we" (as
Python contributors) could not fix that easily.

> Most probably.  If national identifiers get introduced through Latin-1 or
> UTF-8, the problem appears smaller.  But I agree with you that for the
> sake of Python being useful to more countries, it is better going the
> Unicode way and afford both narrow and wide characters for identifiers.
> This approach would also increase Python self-consistency on charsets.

Indeed, the OP probably would not be happier if Python allowed
Latin-1. Using UTF-8 might reduce the problems, but would be
inconsistent with the rest of the character set support in Python,
where Unicode objects are the data type for
text-with-specified-representation.

Regards,
Martin






More information about the Python-list mailing list