[Web-SIG] parsing of urlencoded data and Unicode

Bill Janssen janssen at parc.com
Tue Jul 29 03:32:44 CEST 2008


> In wsgix I use utf-8 for decoding the QUERY_STRING, and the charset 
> specified in the POST'ed data (utf-8 or the charset found in the special 
> _charset_ field).

That's probably wrong.  We went through this recently on the
python-dev list.  While it's possible to tell the encoding of
multipart/form-data, the query_string and x-www-form-urlencoded data
may be in arbitary character set encodings (see RFC 3986).  It's
probably best to not try to map them to strings; instead, return byte
arrays for the value, and only return strings for data that can be
correctly decoded.  Otherwise, you lose information that the app
cannot recover.

Bill


More information about the Web-SIG mailing list