open html page for parsing

Chris Rebert clp2 at rebertia.com
Tue Oct 4 03:22:31 EDT 2011


On Mon, Oct 3, 2011 at 11:58 PM, luca72 <lucaberto at libero.it> wrote:
> Hello i have a simple question:
> up to now if i have to parse a page i do as follow:
> import urllib
> site = urllib.urlopen('http://www.blabla.ooo')
> list_a = site.readline()
> site.close()
> __and then i make my work__
>
> Now i have the site that is open by an html file like this:
>                <table>
> <b>insert the password</b>
> <form name="form_a" method="POST" action="http://lalal.hhdik/
> lolo.php">
>                                <input type="passwd" name="password" value="password"></input>;
>                                <input type="submit" name="Entra" value="entra"></input>;
>                        </form>
>                </table>
>
>
> </div>
> <script type="text/javascript">
> document.form_a.submit();
> </script>
>
> this is in a file called example.html
>
> how can i open it with urllib

Assuming you meant "How do I submit the form in example.html and get
the resulting response page?", use urllib.urlencode() to encode the
form's key-value pairs, and then pass the encoded result as the `data`
argument to urllib.urlopen(). Or use something like Selenium
(http://seleniumhq.org/ ), mechanize
(http://wwwsearch.sourceforge.net/mechanize/ ), or Scrapy
(http://scrapy.org/ ) that emulates/drives a web browser and lets you
fill out forms programatically.

Cheers,
Chris
--
http://rebertia.com



More information about the Python-list mailing list