[Python-ideas] Fall back to encoding unicode strings in utf-8 if latin-1 fails in http.client

Emil Stenström em at kth.se
Thu Jan 7 13:50:49 EST 2016


Den 2016-01-07 kl. 17:46, skrev Cory Benfield:
>> On 7 Jan 2016, at 16:32, Guido van Rossum <guido at python.org>
>> wrote:
>>
>> Personally I'm perplexed that Requests, which claims to be "HTTP
>> for Humans" doesn't take care of this but just lets http/client.py
>> blow up. (However, IIUC both 2838 and 1822 are about the
>> body.encode() call in Python 3's http/client.py at _send_request().
>> 1926 seems to originate in Requests itself; it's also Python 2.7.)
>
> The main reason is historical: this was missed in the original
> (substantial) rewrite in requests 2.0, and as a result we can’t
> change it without a backward compat break, just the same as Python.
> We’ll probably fix it in 3.0.

So as things stand:

* The general consensus seems to be that the raised error should be 
changed to something like: TypeError("Unicode string supplied without an 
explicit encoding")

* Python would like to change http.client to reject unicode input with 
an exception, but won't because of backwards compatibility

* Requests would like to do the same but won't because of backwards 
compatibility

I think it will be very hard to find code that breaks because of a type 
change in the exception when sending invalid data. On the other hand, 
it's VERY easy to find people that are affected by the confusing error 
currently in use everywhere.

When a backward compatible change makes life easier for 99.9% of users, 
and 0.1% of users need to debug a TypeError with a very clear error 
message (which was probably a bug in their code to begin with), I'm 
starting to question having a policy that strict.

/Emil


More information about the Python-ideas mailing list