save gb-2312 web page in a .html file

Matt Nordhoff mnordhoff at mattnordhoff.com
Wed Dec 26 18:31:24 EST 2007


Peter Pei wrote:
> You must be right, since I tried one page and it worked. But there is 
> something wrong with this particular page: 
> http://overseas.btchina.net/?categoryid=-1. When I open the saved file (with 
> IE7), it is all messed up.
> 
>     url = 'http://overseas.btchina.net/?categoryid=-1'
>     headers = { 'User-Agent' : 'Mozilla/4.0 (compatible; MSIE 5.5; Windows 
> NT)' }
>     req = urllib2.Request(url, None, headers)
>     page = urllib2.urlopen(req).read()
> 
>     htmlfile = open('btchina.html','w')
>     htmlfile.write(page)
>     htmlfile.close() 

I dunno. The file does specify its charset, so unless IE ignores that
and tries to guess and fails, it should work fine.
-- 



More information about the Python-list mailing list