Unicode from Web to MySQL
Skip Montanaro
skip at pobox.com
Sat Dec 20 23:08:11 EST 2003
Bill> Encoding for example is a UTF-8 page Vietnamese,
Bill> try:
Bill> http://www.rfa.org/service/index.html?service=vie
...
Bill> I've tried grabbing this, doing vietstring.decode(None,'strict')
Bill> gives an error (wants a string, not None), doing
Yeah, I can see that passing None would be a problem. I assume vietstring
is a string object, not a unicode object? If so, try
uviet = unicode(vietstring, "utf-8")
which would give you a unicode object. You can then convert it to other
encodings if you want.
Furthermore, if you know the string is already encoded as utf-8, you should
be able to just stuff it in your database as-is. I store utf-8 strings in
MySQL all the time.
Bill> unicode(data,'unicode','replace') fails,
Bill> unicode(data,'raw-unicode-escape','replace') somewhat works,
Bill> I can then try
Bill> unicode(data,'raw-unicode-escape','replace').encode('utf-8')
Bill> but I get a SQL error at that point.
Bill> (SQL statement is:
Bill> ' Insert INTO test_utf8 (title) VALUES ( '%s') ' % data2
Bill> which if I put straight ascii text of some 1000 characters or so has
Bill> no problem, Vietnamese gives me the SQL error.
Can you paste an actual Python session into an email?
Here's an example. Table looks like this:
mysql> describe testac;
+--------------+-------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+--------------+-------------+------+-----+---------+-------+
| object_id | int(11) | YES | | NULL | |
| object_title | varchar(64) | YES | | NULL | |
+--------------+-------------+------+-----+---------+-------+
2 rows in set (0.03 sec)
Code looks like this:
>>> conn = MySQLdb.Connect(host="localhost", db="test", user="someuser",
... passwd="somepasswd")
>>> c = conn.cursor()
>>> s = u'\u1234'
>>> s
u'\u1234'
>>> s.encode('utf8')
'\xe1\x88\xb4'
>>> c.execute('insert into testac values (47, %s)', (s,))
Traceback (most recent call last):
File "<stdin>", line 1, in ?
File "/usr/local/lib/python2.2/site-packages/MySQLdb/cursors.py", line 95, in execute
return self._execute(query, args)
File "/usr/local/lib/python2.2/site-packages/MySQLdb/cursors.py", line 114, in _execute
self.errorhandler(self, exc, value)
File "/usr/local/lib/python2.2/site-packages/MySQLdb/connections.py", line 33, in defaulterrorhandler
raise errorclass, errorvalue
UnicodeError: Latin-1 encoding error: ordinal not in range(256)
Note that trying to directly insert a Unicode object fails because it tries
to perform a default encoding (in this case to Latin-1). The Latin-1
charset doesn't have an encoding for that codepoint.
>>> c.execute('insert into testac values (47, %s)', (s.encode('utf-8'),))
1L
But encoding it as utf-8 succeeds.
>>> c.execute('select * from testac where object_id=47')
1L
>>> c.fetchall()
((47L, '\xe1\x88\xb4'),)
And fetching that row shows that the object_title field does indeed have the
three-byte utf-8 encoded string.
Skip
More information about the Python-list
mailing list