web app breakage with utf-8

elmo elmo at msnnotaol.com
Thu Jul 6 15:41:32 EDT 2006


On Thu, 06 Jul 2006 19:16:53 +0200, Stefan Behnel wrote:
>> 
>> Is there a correct way to handle text input from a <FORM> when the page is
>> utf-8 and that input is going to be used in SQL statements? I've tried
>> things like (with no success): 
>> sql = u"select * from blah where col='%s'" % input
> 
> What about " ... % unicode(input, "UTF-8")" ?
> 
> 

I guess it's similar, I've had partial success with input.decode('utf-8')
before DB usage, and then output.encode('utf-8') for output. But although
this stores and displays newly added utf-8 texts correctly, it
causes other problems when displaying the existing texts. I think
they're suffering from a double encoding issue. It seems rather
strange the encode/decode appears to be required now, and not before.
Is this how it should be done?



> 
> You didn't tell us what database you are using, which encoding your
> database uses, which Python-DB interface library you deploy, and lots of
> other things that might be helpful to solve your problem.

That would be MySQLdb with latin1, but I've tried various methods to make
it utf-8 (lots of guidance online). But this was only after I discovered
the breakage with the newer python. I.e. it has worked for years on both
machines and various python versions. I omitted that info because I can
paste the SQL into mysql's shell, it does the expected thing with no
errors, so I assumed the DB itself isn't the cause. I guess it could 
be a new MySQLdb issue causing breakage.


I feel I can see part of the light, but if I'm close to what I think
is needed, it's not practical to change everything to handle encode/decode
site wide, especially as some of the data gets moved to Oracle for other
applications (most is written in Perl).



I'm thinking I need to do this now, is this the norm?:

get user input from web
text.encode('utf-8')
store or use as search in DB
text.decode('utf-8')
display page etc

The encode/decode stages have never been required before :-(


> 
> Stefan




More information about the Python-list mailing list