[Tutor] screen scraping without the request

Kent Johnson kent37 at tds.net
Sun Apr 22 13:38:22 CEST 2007


Rohan Deshpande wrote:
> Hi All,
> 
> the previous thread on screen scraping got me thinking of starting a 
> similar project.  However, the problem is I have no idea what the POST 
> request is as there is no escape string after the URL when the resulting 
> page comes up.  I essentially need to pull the HTML from a page that is 
> generated on a users machine and pipe it into a python script.  How 
> should I go about doing this?  Is it possible/feasible to decipher the 
> POST request and get the HTML, or use some screen scraping python libs a 
> la the javascript DOM hacks? I was thinking of the possibilities of the 
> former, but the interaction on the site is such that the user enters a 
> username/password and goes through a couple links before getting to the 
> page I need.  Perhaps Python can use the session cookie and then pull 
> the right page?

I think the mechanize library can help with this (though the site is 
down at the moment so I can't check):
http://wwwsearch.sourceforge.net/mechanize/

Kent


More information about the Tutor mailing list