[Python-ideas] Fall back to encoding unicode strings in utf-8 if latin-1 fails in http.client

Thu Jan 7 11:57:42 EST 2016

On Jan 7, 2016, at 01:20, Emil Stenström <em at kth.se> wrote:
> 
> This is also how other languages http libraries seem to deal with this, sending in unicode just works:

No, sending Unicode as UTF-8 doesn't "just work", except when the server is expecting UTF-8. Otherwise, it just makes the problem harder to debug.

Most commonly, people who run into this problem with requests are trying to send JSON or form-encoded data. In either case, the solution is simple: just pass the object to the json= or data= parameter. It's only if you try to do it half-way yourself, calling json.dumps but then not calling .encode, that you run into a problem.

I've also seen people run into this uploading files. Again, if you let requests just take care of it for you (by passing it the filename or file object), it just works. But if you try to do it half-way, reading the whole file into memory as a string but not encoding it, that's when you have problems.

The solution in every case is simple: don't make things harder for yourself by doing extra work and then trying to use the lower-level API, just let requests do it for you.

Of course if you're using http.client or urllib instead of requests, you don't have that option. But if http.client is too low-level for you, the solution isn't to hack up http.client to be more magical when used by people who don't know what they're doing in hopes that it'll work more often than it'll cause further and harder-to-debug problems, it's to tell them to use requests if they don't want to learn what they're doing.