[Tkinter-discuss] OT: Unicode

Sat Mar 22 03:26:01 CET 2014

Technically this is a Python question, not a Tkinter question, but it's in the context of a Tkinter application so I don't feel *too* guilty about posting it here.

OK. I've got at Tkinter application (running with Python 2.7.2 on Ubuntu 12.04.4 LTS) that needs to handle French accented characters. And it does handle accented characters just fine. I can type an accented character into an Entry and it shows up correctly. I can display it on a Text. I can cPickle it to disk and read it back. For example, if I enter e-circumflex (in at Tkinter Entry) and then print it using repr I get:

     u\'EA'

If I look in the cPickled file there are 0xEA's where the e-circumflex characters are. So far so good.

The problem comes when I need to read into my Tkinter application a file which has accented characters and which was prepared using a text editor like, for example, gedit. The file to be read also has 0xEA's to represent e-circumflex. However, when I read such a file the resulting string then contains u'\cd\xaa' where the e-circumflexes belong. I don't know who is doing the unwanted conversion or how to make it go away. I've tried reading in binary mode, I've tried opening the file using:

     F = codecs.open('temp.txt', encoding='latin-1')

I've tried putting:

     # -*- coding: latin-1 -*

as the second line of my program. I've tried reading Python/unicode documentation till my eyes went blurry. All to no avail.

There is probably some really simple solution to this, but so far I've failed to find. it.

Thus, if anyone out there in Tkinter land knows the simple solution or could point me to a good source of information I would greatly appreciate it.

Thanks

Cam Farnell