UTF-8 in basic CGI mode

Wed Jan 16 09:31:05 EST 2008

coldpizza  <vriolk at gmail.com> wrote:
>I am using this 'word' variable like this:
>
>print u'''<input type="text" name="blabla" value="%s">''' % (word)
>
>and apparently this causes exceptions with non-ASCII strings.
>
>I've also tried this:
>print u'''<input type="text" name="blabla" value="%s">''' %
>(word.encode('utf8'))
>but I still get the same UnicodeDecodeError..

Your 'word' is a byte string (presumably UTF8 encoded). When python
is asked to insert a byte string into a unicode string (as you are
doing with the % operator, but the same applies to concatenation
with the + operator) it attempts to convert the byte string into
unicode. And the default encoding is 'ascii', and the ascii codec
takes a very strict view about what an ASCII character is -- and
that is that only characters below 128 are ASCII.

To get it to work, you need to *decode* word. It is already UTF8
(or something) encoded. Under most circumstances, use encode() to
turn unicode strings to byte strings, and decode() to go in the
other direction.

-- 
\S -- siona at chiark.greenend.org.uk -- http://www.chaos.org.uk/~sion/
   "Frankly I have no feelings towards penguins one way or the other"
        -- Arthur C. Clarke
   her nu becomeþ se bera eadward ofdun hlæddre heafdes bæce bump bump bump