[Tutor] screen scraping without the request
Kent Johnson
kent37 at tds.net
Sun Apr 22 13:38:22 CEST 2007
Rohan Deshpande wrote:
> Hi All,
>
> the previous thread on screen scraping got me thinking of starting a
> similar project. However, the problem is I have no idea what the POST
> request is as there is no escape string after the URL when the resulting
> page comes up. I essentially need to pull the HTML from a page that is
> generated on a users machine and pipe it into a python script. How
> should I go about doing this? Is it possible/feasible to decipher the
> POST request and get the HTML, or use some screen scraping python libs a
> la the javascript DOM hacks? I was thinking of the possibilities of the
> former, but the interaction on the site is such that the user enters a
> username/password and goes through a couple links before getting to the
> page I need. Perhaps Python can use the session cookie and then pull
> the right page?
I think the mechanize library can help with this (though the site is
down at the moment so I can't check):
http://wwwsearch.sourceforge.net/mechanize/
Kent
More information about the Tutor
mailing list