web app breakage with utf-8

Thu Jul 6 12:35:12 EDT 2006

Hello, after two days of failed efforts and googling, I thought I had
better seek advice or observations from the experts. I would be grateful
for any input.

We have various small internal web applications that use utf-8 pages for
storing, searching and retrieving user input. They have worked fine for
years with non ASCII values, including Russian, Greek and lots of accented
characters. They still do on an old version of python (2.2.1), and there's
nothing in the code to decode/encode the input, it's *just worked*.

Recently however, while testing on a dev machine, I notice that any
characters outside ASCII are causing SQL statement usage to break with
UnicodeDecodeError exceptions with newer versions of python (2.3 and 2.4).
There are a number of threads online, suggesting converting to unicode
types, and similar themes, but I'm having no success. I am probably
completely misunderstaning something fundamental. :-( 

My first question is did something change for normal byte stream usage
making it more strict? I'm surprised there aren't more problems
like this online.

Is there a correct way to handle text input from a <FORM> when the page is
utf-8 and that input is going to be used in SQL statements? I've tried
things like (with no success): 
sql = u"select * from blah where col='%s'" % input

Doing sql = sql.decode('latin1') prior to execution prevents the
some UnicodeDecodeError exceptions, but the data retrieved from the tables
is no longer usable, causing breakage when being used to create the output
for the browser.

I really am at a loss for what is going wrong, when everything works fine
on crusty old 2.2.1. What are others doing for caputre, store, and output
for web utf-8?

Rgds,
Jason