[Python-Dev] Can the cgi module be made Unicode-aware?

Guido van Rossum guido@python.org
Thu, 11 Apr 2002 08:56:26 -0400


> I keep trying to handle various places in my code where I can get
> input in non-ASCII encodings.  Today I realized the cgi module does
> nothing to translate Unicode data into unicode objects.  I see in
> one instance that I am getting data that is clearly utf-8 encoded,
> but I see nothing in the CGI script's environment variables to
> suggest the client web browser told the server how the data was
> encoded other than the obvious "Content-Type:
> application/x-www-form-urlencoded".  Is utf-8 implied for the data
> once the url encoding has been reversed?

I very much doubt it.  You probably received that UTF-8 data from a
non-standard-conforming browser.

> Should the cgi module be made Unicode-aware?  If so, how?  I can
> never remember the incantation to convert non-ASCII string objects
> to Unicode objects and nothing I've tried by trial-and-error so far
> works.

I must be misunderstanding your question, because the answer I'm
thinking of is unicode(s,'utf8') and that can't possibly be what you
can never remember.

> I *don't* want to adopt the workaround outlined in FAQ
> question 4.102 (change the default site-wide encoding).  Perhaps
> that question should be extended with more appropriate information
> about converting raw strings with non-ASCII content to unicode.

(There's also an approach that tries to compare the converted to the
unconverted version and catches the exception; if no exception is
raised, the input string was pure ASCII and the Unicode conversion is
unnecessary.)

--Guido van Rossum (home page: http://www.python.org/~guido/)