Tkinter wart: returned texts are sometimes strings, sometime Unicode strings
Alex Martelli
aleax at aleax.it
Fri Mar 21 05:57:12 EST 2003
Eric Brunel wrote:
...
> I'd really like to have opinions from other people who use Tkinter and
> whose native language is not english. But IMHO, this would have been a far
> better idea, at least until Unicode strings can be manipulated exactly the
> same way plain strings are, which is not exactly the case today.
Not exactly, but for example s.encode('utf-8') returns equivalent
plain-string objects whether s is itself an ASCII plain-string object
or a Unicode string object, and unicode(s) returns equivalent Unicode
objects whether s is an ASCII plain-string object or a Unicode string
object. So, it's not TOO hard to compensate for Tkinter's attempts
to accomodate the user's convenience (which, like many other attempts
at providing convenience, may well end up being in one's way, sigh).
A workaround to ensure Tkinter's widgets' methods return Unicode
strings exclusively is therefore reasonably simple, on the lines of:
>>>
>>> def wrapEnsuringUnicode(f):
... def wrapper(*args, **kwds):
... return unicode(f(*args, **kwds))
... return wrapper
...
>>> import Tkinter
>>> Tkinter.Misc.cget=wrapEnsuringUnicode(Tkinter.Misc.cget)
>>> root = Tkinter.Tk()
>>> root.cget('height')
u'0'
>>>
You could perform such wrapping either dynamically, or statically
in a modified Tkinter.py of your own or by inheritance.
A better fix might be to modify _tkinter.c to avoid the "smart"
way PyTclObject_string now strives to return plain string
objects when all contents are ASCII:
if (!self->string) {
s = Tcl_GetStringFromObj(self->value, &len);
for (i = 0; i < len; i++)
if (s[i] & 0x80)
break;
#ifdef Py_USING_UNICODE
if (i == len)
/* It is an ASCII string. */
self->string = PyString_FromStringAndSize(s, len);
else {
self->string = PyUnicode_DecodeUTF8(s, len, "strict");
if (!self->string) {
PyErr_Clear();
self->string = PyString_FromStringAndSize(s, len);
}
}
#else
self->string = PyString_FromStringAndSize(s, len);
#endif
down to just:
if (!self->string) {
s = Tcl_GetStringFromObj(self->value, &len);
#ifdef Py_USING_UNICODE
self->string = PyUnicode_DecodeUTF8(s, len, "strict");
if (!self->string) {
PyErr_Clear();
self->string = PyString_FromStringAndSize(s, len);
}
#else
self->string = PyString_FromStringAndSize(s, len);
#endif
I'm not sure what this could break (indeed, I'm not even sure the
fallback to returning a string if decoding as utf-8 faiils is even
warranted). But perhaps we're getting into areas more appropriate
for the python-dev list than for the general python list.
Alex
More information about the Python-list
mailing list