Dealing with "funny" characters

John Nagle nagle at animats.com
Sat Oct 20 11:42:15 EDT 2007


Gert-Jan wrote:
> sophie_newbie schreef:
>> Hi, I want to store python text strings that characters like "é" "Č"
>> in a mysql varchar text field. Now my problem is that mysql does not
>> seem to accept these characters. I'm wondering if there is any way I
>> can somehow "encode" these characters to appear as normal characters
>> and then "decode" them when I want to get them out of the database
>> again?
> 
> 
> It seems you'll have to use Unicode in your program rather than 'plain' 
> strings.
> 
> Before storing an unicode textstring in a database or a file, you must 
> encode it using an appropriate encoding/codepage, for example:
> 
> outputstring = unicodeobject.encode('utf-8')

    No, no, that's wrong.  MySQL and the Python interface to it understand
Unicode.  You don't want to convert data to UTF-8 before putting it in a
database; the database indexing won't work.

    Here's how to do it right.

    First, tell MySQL, before you create your MySQL tables, that the tables are
to be stored in Unicode:

	ALTER database yourdatabasename DEFAULT CHARACTER SET utf8;

You can also do this on a table by table basis, or even for single fields,
but you'll probably get confused if you do.

    Then, when you connect to the database in Python, use something like this:

	db = MySQLdb.connect(host="localhost",
		use_unicode = True, charset = "utf8",
		user=username, passwd=password, db=database)

That tells MySQLdb to talk to the database in Unicode, and it tells the database
(via "charset") that you're talking Unicode.

    Within Python, you can use Unicode as well.  If you have a Unicode text
editor, you can create Python source files in Unicode and have Unicode text
constants in quotes.  If you do this, you should put

	# -*- coding: UTF-8 -*-

as the first line of Python files.  Quoted constants should be written
as

	s = u'Test'

rather than

	s = 'Test'

Instead of "str()", use "unicode()".

Once everything is set up like this, you can pass Unicode in and out of MySQL
databases freely, and all the SQL commands will work properly on Unicode data.

					John Nagle

	






More information about the Python-list mailing list