how to detect the character encoding in a web page ?

Hans Mulder hansmu at xs4all.nl
Sun Dec 23 20:30:47 EST 2012


On 24/12/12 01:34:47, iMath wrote:
> how to detect the character encoding  in a web page ?

That depends on the site: different sites indicate
their encoding differently.

> such as this page:  http://python.org/

If you download that page and look at the HTML code, you'll find a line:

  <meta http-equiv="content-type" content="text/html; charset=utf-8" />

So it's encoded as utf-8.

Other sites declare their charset in the Content-Type HTTP header line.
And then there are sites relying on the default.  And sites that get
it wrong, and send data in a different encoding from what they declare.


Welcome to the real world,

-- HansM



More information about the Python-list mailing list