help with (x)html / xml encoding...

Steven Taschuk staschuk at telusplanet.net
Thu Mar 20 21:51:11 EST 2003


Quoth lt:
> i'm looking for a way to extract encoding from a file retrieved by urllib,
> i'm planning of creating a "restricted" parser which will only examine <?
> and <meta tags, to check for :
> 
> <meta http-equiv="content-type" content="text/html; charset=xxxencodingxxx">
> or
> <?xml version="1.0" encoding="'xxxencodingxxx'"?>
> 
> do you think that is enough ? how should you do it ?

You should also check the data in urlopen(foo).info() for a
Content-Type header; the value of that header is supposed
to take precedence over either of the above.

-- 
Steven Taschuk                  staschuk at telusplanet.net
"Telekinesis would be worth patenting."  -- James Gleick





More information about the Python-list mailing list