[Python-ideas] Fall back to encoding unicode strings in utf-8 if latin-1 fails in http.client

Andrew Barnert abarnert at yahoo.com
Thu Jan 7 14:31:45 EST 2016


On Thursday, January 7, 2016 11:05 AM, Guido van Rossum <guido at python.org> wrote:

> I'm not sure whether it's a good idea to change the exception type from TypeError to UnicodeError -- the exception is really related to Unicode so keeping UnicodeError but changing the message sounds like the right thing to do. And this can be done independently in both Requests and the stdlib.
That sounds like a good idea. A UnicodeEncodeError (or subclass of it?) with text like "HTTP body without encoding defaults to 'latin-1', which can't encode character '\u5555' in position 30: ordinal not in range(256)") would be pretty simple to implement, and would help a lot more than the current text. (And, for those who still can't figure it out, being a unique error message means that within a few days of the change, googling it should get a relevant StackOverflow answer, which isn't true for the generic encoding error message.)


Requests could get fancier. For example, if the string starts with "{", make the error message ask if maybe they wanted to use json=obj instead of data=json.dumps(obj). But I think that wouldn't be appropriate for the stdlib. (Especially since http.client doesn't have a json parameter...) But then it sounds like Requests is planning to remove implicitly-Latin-1 strings via data= anyway in 3.0, which would solve the problem more simply.


More information about the Python-ideas mailing list