Unicode chr(150) en dash

John Nagle nagle at animats.com
Wed Apr 16 13:27:26 EDT 2008


marexposed at googlemail.com wrote:
> Hello guys & girls
> 
> I'm pasting an "en dash"
> (http://www.fileformat.info/info/unicode/char/2013/index.htm) character into
> a tkinter widget, expecting it to be properly stored into a MySQL database.
> 
> I'm getting this error: 
> *****************************************************************************
>  Exception in Tkinter callback Traceback (most recent call last): File
> "C:\Python24\lib\lib-tk\Tkinter.py", line 1345, in __call__ return
> self.func(*args) File "chupadato.py", line 25, in guardar cursor.execute(a) 
> File "C:\Python24\Lib\site-packages\MySQLdb\cursors.py", line 149, in execute
>  query = query.encode(charset) UnicodeEncodeError: 'latin-1' codec can't
> encode character u'\u2013' in position 52: ordinal not in range(256) 
> *****************************************************************************

     Python and MySQL will do end to end Unicode quite well.  But that's
not what you're doing.  How did "latin-1" get involved?

     If you want to use MySQL in Unicode, there are several things to be done.
First, the connection has to be opened in Unicode:

	db = MySQLdb.connect(host="localhost",
		use_unicode = True, charset = "utf8",
		user=username, passwd=password, db=database)

Yes, you have to specify both "use_unicode=True", which tells the client
to talk Unicode, and set "charset" to"utf8", which tells the server
to talk Unicode encoded as UTF-8".

Then the tables need to be in Unicode.  In SQL,

     ALTER DATABASE dbname DEFAULT CHARACTER SET utf8;

before creating the tables.  You can also change the types of
existing tables and even individual fields to utf8, if necessary.
(This takes time for big tables; the table is copied.  But it works.)

     It's possible to get MySQL to store character sets other than
ASCII or Unicode; you can store data in "latin1" if you want. This
might make sense if, for example, all your data is in French or German,
which maps well to "latin1".  Unless that's your situation, go with
either all-ASCII or all-Unicode.  It's less confusing.

					John Nagle



More information about the Python-list mailing list