CGI and Unicode

Andrew Clover and-google at doxdesk.com
Mon Jun 23 18:49:42 EDT 2003


jhefferon at smcvt.edu (Jim Hefferon) wrote:

> the best that I can hope for is to set the page with the form on it
> to be showing, say UTF-8, and then the data should show up UTF-8
> encoded at my site.  

Yes. Theoretically you should also be able to use the accept-charset
attribute on <form>, but many common browsers ignore that.

> But when I ask what is the type of the variable that I get from 
> the cgi module, it comes out as StringType, not UnicodeType.

Yes, you have to decode all submitted strings manually when using the
cgi module.

You don't actually lose anything this way, because although browsers
are *supposed to* submit charset information when they send a multipart/
form-data request, they actually don't, so you have to guess from the
encoding you originally sent them.

Plug: you might want to look at:

  http://www.doxdesk.com/software/py/form.html

This is a replacement for the 'cgi' module which tries to make things
generally less hassle; it will optionally give you Unicode input
automatically, for one.

-- 
Andrew Clover
mailto:and at doxdesk.com
http://www.doxdesk.com/




More information about the Python-list mailing list