requests.Session() how do you set 'replace' on the encoding?

dieter dieter at handshake.de
Fri Jul 3 01:59:24 EDT 2015


Veek M <vek.m1234 at gmail.com> writes:

> I'm getting a Unicode error:
>
> Traceback (most recent call last):
>   File "fooxxx.py", line 56, in <module>
>     parent = anchor.getparent()
> UnicodeEncodeError: 'gbk' codec can't encode character u'\xa0' in position 
> 8: illegal multibyte sequence

You give us very little context.

Using "getparent" seems to indicate that you are doing something with
hierarchies, likely some XML processing. In this case,
the XML document likely specified "gbk" as document encoding
(otherwise, you would get the default "utf-8") -- and it got it wrong
(which should not happen).


In general: when you need control over encoding handling because
deep in a framework an econding causes problems (as apparently in
your case), you can usually first take the plain text,
fix any encoding problems and only then pass the fixed text to
your framework.


> I'm doing:
> s = requests.Session()
> to suck data in, so.. how do i 'replace' chars that fit gbk

It does not seem that the problem occurs inside the "requests" module.
Thus, you have a chance to "intercept" the downloaded text
and fix encoding problems.




More information about the Python-list mailing list