tkinter + unicode + bug or feature??

Fri Jan 24 22:54:16 EST 2003

Martin v. Löwis wrote:
> Bob van der Poel wrote:
> 
>> BTW, I really think this is a bug. If you enter "ascii" text into the 
>> entry box you get() returns a string, if you enter "extended ascii" 
>> you get a unicode string. And since one can't tell beforehand what the 
>> user is going to enter... Add to this the fact that the behaviour is 
>> not documented in the tkinter reference manual (yes, it is the tcl/tk 
>> manual).
> 
> 
> So what do you think the correct behaviour should be?

Well, since I work mostly in plain ascii, not unicode, I would think the 
correct behaviour would be to return a regular string <type 'str'>. I 
think this is what it was in tcl/tk pre-8.1. And if the community wanted 
to have unicode, that would be fine as well. But, the way it is now one 
never knows if one is going to get a <type 'unicode'> or a 'str'. And 
that isn't right, is it?

>> Well, yes. Being on the US-side (altho I do live in Canada and we're a 
>> bit less centric in our thinking) I was just referring to a "normal" 
>> encoding...whatever that is :)
> 
> 
> There is no such thing.
> 

Yes, as I would have figured if I'd given it any thought :)

>> Yes, local.getlocale() works fine. Now, if I do use encode on these 
>> strings, will I run into problems if the user's locale is not 
>> encodable into 8bits. Or can that not happen?
> 
> 
> Depends on what you mean by "8bits". You might have meant to ask
> 
> Q. Could it happen that the user enters characters that cannot be 
> represented in the 'normal encoding'?
> A. Yes, this can happen. If you merely want to compare this to another 
> byte string, you should decode that byte string to Unicode, and perform 
> the comparison then.
> 
> or you meant to ask
> 
> Q. Could it happen that the encoding produces more than one byte per 
> character.
> A. Yes, this can happen, but it is no problem.
> 
> or you meant to ask
> 
> Q. Will Python support 'normal encodings' that produce more than one 
> byte per character out of the box?
> A. No, Python does not ship with any such codecs (*). You should install 
> the JapaneseCodecs, KoreanCodecs, or ChineseCodecs package for that.

What I think I really meant to ask is:

If my program takes strings entered by a user in a Entry() widget and I 
take that data, convert it from a possible unicode string to the user's 
current locale, will the result always be a regular string? Really, what 
I'm trying to do is to avoid having my program crash when I do something 
like:

     a=entrywidget.get()
     if a == somestring:
        .....

Current, 'somestring' IS a regular string. And if 'a' is a unicode the 
program aborts. So, I'm planning on replacing get() with myget() which 
will just do:

     a=widget.get().encode(userEncoding)

Seems to be a bit of a waste to encode each and every get(), but it is 
probably just as fast to encode as it is to test to see if it is a str.

And we're sure there isn't a tcl/tk setting to take of this???

-- 
Bob van der Poel ** Wynndel, British Columbia, CANADA **
EMAIL: bvdpoel at kootenay.com
WWW:   http://www.kootenay.com/~bvdpoel