save gb-2312 web page in a .html file
Matt Nordhoff
mnordhoff at mattnordhoff.com
Wed Dec 26 18:31:24 EST 2007
Peter Pei wrote:
> You must be right, since I tried one page and it worked. But there is
> something wrong with this particular page:
> http://overseas.btchina.net/?categoryid=-1. When I open the saved file (with
> IE7), it is all messed up.
>
> url = 'http://overseas.btchina.net/?categoryid=-1'
> headers = { 'User-Agent' : 'Mozilla/4.0 (compatible; MSIE 5.5; Windows
> NT)' }
> req = urllib2.Request(url, None, headers)
> page = urllib2.urlopen(req).read()
>
> htmlfile = open('btchina.html','w')
> htmlfile.write(page)
> htmlfile.close()
I dunno. The file does specify its charset, so unless IE ignores that
and tries to guess and fails, it should work fine.
--
More information about the Python-list
mailing list