help with (x)html / xml encoding...

Steven Taschuk staschuk at telusplanet.net
Fri Mar 21 15:13:35 EST 2003


Quoth lt:
  [...]
> and in python i do :
> >>> import urllib
> >>> sock = urllib.urlopen('http://192.168.0.1/example.html')
> >>> sock.info().getencoding()
> '7bit'

That's the content transfer encoding of the HTTP message, which is
a MIME thing to describe how the bytes of the content have been
encoded for transmission over the network.  7bit is the default.
In full MIME, binary files might be, for example, base64-encoded
for transport over unclean protocols; HTTP is binary clean, so the
problem doesn't really arise.

If this seems like gobbledygook, forget it; it doesn't matter.
The point is that it has nothing to do with the character set used
in the text of the content, if it is text.

What you want is this:

	>>> s = urllib.urlopen('http://zvuki.ru/A/P/6655/')
	>>> s.info().dict['content-type']
	'text/html; charset=windows-1251'

-- 
Steven Taschuk                  staschuk at telusplanet.net
"Telekinesis would be worth patenting."  -- James Gleick





More information about the Python-list mailing list