[Baypiggies] urllib.urlencode and encoding

Tung Wai Yip tungwaiyip at yahoo.com
Thu Apr 19 20:14:54 CEST 2007


> On Apr 18, 2007, at 11:17 PM, Keith Dart wrote:
>
>> On Wed, 18 Apr 2007 21:15:34 -0700
>> David Reid <dreid at dreid.org> wrote:
>>
>>> So I think it's still incorrect for urllib to make any such
>>> assumptions as to the data being UTF-8. (Though I hope it won't be in
>>> the future.)
>>
>> The RFC, and the previous discussion, have nothing to do with the
>> content (data) encoding. It's only concerned with the URL encoding.
>
> The relevant section of the HTML4 forms spec is concerned with the
> URL encoding if the URL is generated by the browser as part of a form
> submission.  So I'm still gonna have to go with it being pretty much
> completely wrong for urllib to make any assumptions about the charset
> of %-encoded data (either in a url segment or in query args.)  Not
> that life wouldn't be much nicer if everything weren't UTF-8, but the
> world isn't that nice to begin with.
>
> - -David
> http://dreid.org


Here is an example. The key parameter is BIG-5 encoded. Welcome to the  
tower of babel!

http://search.books.com.tw/exep/prod_search.php?cat=all&key=%A5i%B7R%A4O%B6q%A4j&image233223.x=13&image233223.y=10

Wai Yip


More information about the Baypiggies mailing list