Slurping Web Pages

Peter Hansen peter at engcorp.com
Sat Jan 25 14:53:39 EST 2003


Tony Dunn wrote:
> 
> I've started a new project where I need to slurp web pages from a site that
> use cookies to authenticate access.  I've used *urllib* in the past to grab
> *public* web pages, but I'm not sure the best way to go about dealing with
> the cookie issue.
> 
> I found some code to drive IE via COM, but I can't find a method to save the
> current web page to a file so I can *slurp* it later.  I've wandered through
> the file generated by makepy.py for the *Internet Control* COM object, but I
> don't see what I'm looking for.  I know I can grab the files from the local
> *Internet* cache, but I'd like the option to specify a file location and
> file name for each page downloaded.
> 
> Has anyone done this with IE?

After you've navigated to the page, grab the document's documentElement
and its outerHtml attribute like so, and save it to a file.

from win32com.client import DispatchEx as d
ie = d('internetexplorer.application')
ie.navigate('http://www.yoururl.com')
html = ie.document.documentElement.outerHtml

-Peter




More information about the Python-list mailing list