[Web-SIG] parsing of urlencoded data and Unicode

Manlio Perillo manlio_perillo at libero.it
Tue Jul 29 08:06:17 CEST 2008


Bill Janssen ha scritto:
>> In wsgix I use utf-8 for decoding the QUERY_STRING, and the charset 
>> specified in the POST'ed data (utf-8 or the charset found in the special 
>> _charset_ field).
> 
> That's probably wrong.  We went through this recently on the
> python-dev list.  While it's possible to tell the encoding of
> multipart/form-data, 

With multipart/form-data the problem should be the same.
The content type is defined only for file fields.

> the query_string and x-www-form-urlencoded data
> may be in arbitary character set encodings (see RFC 3986).  It's
> probably best to not try to map them to strings; instead, return byte
> arrays for the value, and only return strings for data that can be
> correctly decoded.  Otherwise, you lose information that the app
> cannot recover.
> 

Interesting, thanks.

I have read Django code and, as far as I can tell, it always decode data 
to strings, but using "replace" error handling.

Can you point me to the discussion on python-dev list?

> Bill
> 


Manlio Perillo


More information about the Web-SIG mailing list