how to detect the character encoding in a web page ?
Hans Mulder
hansmu at xs4all.nl
Sun Dec 23 20:30:47 EST 2012
On 24/12/12 01:34:47, iMath wrote:
> how to detect the character encoding in a web page ?
That depends on the site: different sites indicate
their encoding differently.
> such as this page: http://python.org/
If you download that page and look at the HTML code, you'll find a line:
<meta http-equiv="content-type" content="text/html; charset=utf-8" />
So it's encoded as utf-8.
Other sites declare their charset in the Content-Type HTTP header line.
And then there are sites relying on the default. And sites that get
it wrong, and send data in a different encoding from what they declare.
Welcome to the real world,
-- HansM
More information about the Python-list
mailing list