BUG? Unicode and Python 2.0

Martin von Loewis loewis at informatik.hu-berlin.de
Sat Jun 16 14:56:01 EDT 2001


"C. Meyer" <kgmeyer at comundo.de> writes:

> The reason is (i think), that askopenfilename surprisingly returns
> unicode string instead of string.

Exactly, that's the problem.

> If functions return sometimes unicode strings, how should this be done?

Depends on what kind of processing you want to do. _tkinter will
return plain strings if the string is all-ASCII, or if the conversion
from the Tcl UTF-8 representation failed (which really indicates a Tcl
bug), and Unicode strings otherwise.

If you know that you sometimes get Unicode, your best bet is to do

str = unicode(str)

This will do nothing if str is already unicode, and convert from ASCII
to unicode if it isn't. Then you can uniformly process Unicode strings.

Perhaps _tkinter should be changed to return uniformly Unicode, but
that may hurt performance in cases where the string is *known* to be
ASCII-only (e.g. configuration parameters representing numbers or
color names).

IMO the best thing would be if methods that return arbitrary text
always return Unicode strings, but that is hard to analyse, and will
require loads of changes.

> Is this a bug or a design problem, or do i misunderstand this?

It's certainly a bug in the demo, please report it to
sf.net/projects/python. I don't think there is a design problem
anywhere, but perhaps a large number of bugs.

Regards,
Martin




More information about the Python-list mailing list