MSIE6 Python Question

Ralph A. Gable r.gable at mchsi.com
Mon May 24 20:22:02 EDT 2004


Mike,
Thanks ever so much. That worked and helps tremendously. 
Ralph A. Gable

"Michael Geary" <Mike at DeleteThis.Geary.com> wrote in message news:<10b4bgsimk120ce at corp.supernews.com>...
> Ralph A. Gable wrote:
> > The data I want is being stripped out when I access the URL
> > via urllib. I CAN see the data when I go into IE and do view
> > source but when I use urllib the site intentionally blanks out
> > the information I want. For that reason, I would like to get it
> > using IE6 if I can. If there are other ways to fake out the site,
> > I would be interested in that also.
> 
> You may be able to get urllib or urllib2 to work using some of the other
> tips in this thread, such as the user agent string. Or it may have to do
> with cookies, in which case the ClientCookie module may be useful:
> 
> http://wwwsearch.sourceforge.net/ClientCookie/
> 
> If you do want to use IE, it's really easy. Let's assume you have an ie
> object that you've gotten with:
> 
> ie = win32com.client.Dispatch( 'InternetExplorer.Application' )
> 
> and you've navigated to your URL using ie.Navigate( url ), and you've waited
> for Navigate to finish. Then, you can get the document with:
> 
> doc = ie.Document
> 
> From there, you can get to anything. If you want the entire HTML source,
> it's:
> 
> doc.documentElement.outerHTML
> 
> Or better yet, you can use the IE object model to let IE do the work of
> parsing the HTML for you. For example, suppose the document contains a form
> named 'loginForm' with 'username' and 'password' fields, and you want to
> fill in those two fields and submit the form. You could do it with:
> 
> form = doc.forms.loginForm
> form.username = 'myname'
> form.password = 'mypassword'
> form.submit()
> 
> Basically, you can use about the same code you'd use in JavaScript or Visual
> Basic inside the web page.
> 
> Here's the MSDN reference for the InternetExplorer object:
> 
> http://msdn.microsoft.com/workshop/browser/webbrowser/reference/objects/internetexplorer.asp
> 
> And here's the reference for the document object:
> 
> http://msdn.microsoft.com/workshop/author/dhtml/reference/objects/obj_document.asp
> 
> (Sorry about the long URLs; you know what to do.)
> 
> One other note: You probably already know about this, but after you do do
> the Navigate, you need to wait until IE has loaded the page. You can either
> use the NavigateComplete2 event, or it may be easier to cheat a bit and use
> a loop with time.sleep() and test the ie.Busy property. I like to wait until
> ie.Busy is false and remains false for a couple of seconds, to avoid being
> tripped up by redirects where Busy may go false momentarily and then become
> true again during the redirect.
> 
> -Mike



More information about the Python-list mailing list