UTF-8 in basic CGI mode

coldpizza vriolk at gmail.com
Thu Jan 17 04:38:03 EST 2008


Thanks, Sion, that makes sense!

Would it be correct to assume that the encoding of strings retrieved
by FieldStorage() would be the same as the encoding of the submitted
web form (in my case utf-8)?

Funny but I have the same form implemented in PSP (Python Server
Pages), running under Apache with mod_python and it works
transparently with no explicit charset translation required.

On Jan 16, 4:31 pm, Sion Arrowsmith <si... at chiark.greenend.org.uk>
wrote:
> coldpizza  <vri... at gmail.com> wrote:
> >I am using this 'word' variable like this:
>
> >print u'''<input type="text" name="blabla" value="%s">''' % (word)
>
> >and apparently this causes exceptions with non-ASCII strings.
>
> >I've also tried this:
> >print u'''<input type="text" name="blabla" value="%s">''' %
> >(word.encode('utf8'))
> >but I still get the same UnicodeDecodeError..
>
> Your 'word' is a byte string (presumably UTF8 encoded). When python
> is asked to insert a byte string into a unicode string (as you are
> doing with the % operator, but the same applies to concatenation
> with the + operator) it attempts to convert the byte string into
> unicode. And the default encoding is 'ascii', and the ascii codec
> takes a very strict view about what an ASCII character is -- and
> that is that only characters below 128 are ASCII.
>
> To get it to work, you need to *decode* word. It is already UTF8
> (or something) encoded. Under most circumstances, use encode() to
> turn unicode strings to byte strings, and decode() to go in the
> other direction.
>
> --
> \S -- si... at chiark.greenend.org.uk --http://www.chaos.org.uk/~sion/
>    "Frankly I have no feelings towards penguins one way or the other"
>         -- Arthur C. Clarke
>    her nu becomeþ se bera eadward ofdun hlæddre heafdes bæce bump bump bump




More information about the Python-list mailing list