[Python-ideas] Fall back to encoding unicode strings in utf-8 if latin-1 fails in http.client

Guido van Rossum guido at python.org
Thu Jan 7 14:04:13 EST 2016


On Thu, Jan 7, 2016 at 10:50 AM, Emil Stenström <em at kth.se> wrote:

> Den 2016-01-07 kl. 17:46, skrev Cory Benfield:
>
>> On 7 Jan 2016, at 16:32, Guido van Rossum <guido at python.org>
>>> wrote:
>>>
>>> Personally I'm perplexed that Requests, which claims to be "HTTP
>>> for Humans" doesn't take care of this but just lets http/client.py
>>> blow up. (However, IIUC both 2838 and 1822 are about the
>>> body.encode() call in Python 3's http/client.py at _send_request().
>>> 1926 seems to originate in Requests itself; it's also Python 2.7.)
>>>
>>
>> The main reason is historical: this was missed in the original
>> (substantial) rewrite in requests 2.0, and as a result we can’t
>> change it without a backward compat break, just the same as Python.
>> We’ll probably fix it in 3.0.
>>
>
> So as things stand:
>
> * The general consensus seems to be that the raised error should be
> changed to something like: TypeError("Unicode string supplied without an
> explicit encoding")
>
> * Python would like to change http.client to reject unicode input with an
> exception, but won't because of backwards compatibility
>
> * Requests would like to do the same but won't because of backwards
> compatibility
>
> I think it will be very hard to find code that breaks because of a type
> change in the exception when sending invalid data. On the other hand, it's
> VERY easy to find people that are affected by the confusing error currently
> in use everywhere.
>
> When a backward compatible change makes life easier for 99.9% of users,
> and 0.1% of users need to debug a TypeError with a very clear error message
> (which was probably a bug in their code to begin with), I'm starting to
> question having a policy that strict.
>

What policy are you referring to? I don't think anyone objects against
making the error message clearer. The objection is against rejecting
unicode strings that in the past would have been successfully encoded using
Latin-1.

I'm not sure whether it's a good idea to change the exception type from
TypeError to UnicodeError -- the exception is really related to Unicode so
keeping UnicodeError but changing the message sounds like the right thing
to do. And this can be done independently in both Requests and the stdlib.

-- 
--Guido van Rossum (python.org/~guido)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20160107/1f015594/attachment.html>


More information about the Python-ideas mailing list