Tkinter wart: returned texts are sometimes strings, sometime Unicode strings

Fri Mar 21 05:57:12 EST 2003

Eric Brunel wrote:
   ...
> I'd really like to have opinions from other people who use Tkinter and
> whose native language is not english. But IMHO, this would have been a far
> better idea, at least until Unicode strings can be manipulated exactly the
> same way plain strings are, which is not exactly the case today.

Not exactly, but for example s.encode('utf-8') returns equivalent
plain-string objects whether s is itself an ASCII plain-string object 
or a Unicode string object, and unicode(s) returns equivalent Unicode
objects whether s is an ASCII plain-string object or a Unicode string
object.  So, it's not TOO hard to compensate for Tkinter's attempts
to accomodate the user's convenience (which, like many other attempts
at providing convenience, may well end up being in one's way, sigh).

A workaround to ensure Tkinter's widgets' methods return Unicode
strings exclusively is therefore reasonably simple, on the lines of:

>>>
>>> def wrapEnsuringUnicode(f):
...     def wrapper(*args, **kwds):
...         return unicode(f(*args, **kwds))
...     return wrapper
...
>>> import Tkinter
>>> Tkinter.Misc.cget=wrapEnsuringUnicode(Tkinter.Misc.cget)
>>> root = Tkinter.Tk()
>>> root.cget('height')
u'0'
>>>

You could perform such wrapping either dynamically, or statically
in a modified Tkinter.py of your own or by inheritance.

A better fix might be to modify _tkinter.c to avoid the "smart"
way PyTclObject_string now strives to return plain string
objects when all contents are ASCII:

        if (!self->string) {
                s = Tcl_GetStringFromObj(self->value, &len);
                for (i = 0; i < len; i++)
                        if (s[i] & 0x80)
                                break;
#ifdef Py_USING_UNICODE
                if (i == len)
                        /* It is an ASCII string. */
                        self->string = PyString_FromStringAndSize(s, len);
                else {
                        self->string = PyUnicode_DecodeUTF8(s, len, "strict");
                        if (!self->string) {
                                PyErr_Clear();
                                self->string = PyString_FromStringAndSize(s, len);
                        }
                }
#else
                self->string = PyString_FromStringAndSize(s, len);
#endif

down to just:

        if (!self->string) {
                s = Tcl_GetStringFromObj(self->value, &len);
#ifdef Py_USING_UNICODE
                self->string = PyUnicode_DecodeUTF8(s, len, "strict");
                if (!self->string) {
                        PyErr_Clear();
                        self->string = PyString_FromStringAndSize(s, len);
                }
#else
                self->string = PyString_FromStringAndSize(s, len);
#endif

I'm not sure what this could break (indeed, I'm not even sure the
fallback to returning a string if decoding as utf-8 faiils is even
warranted).  But perhaps we're getting into areas more appropriate
for the python-dev list than for the general python list.

Alex