[Tutor] Cataloging Web Page Information

Kent Johnson kent37 at tds.net
Sat Mar 5 06:50:34 CET 2005


Anderson wrote:
> Hello,
> 
> I currently have access to a webpage that has information that I'd
> like to put into a CSV (comma seperated value) spreadsheet. Its
> frontend is a form; you fill the form out by entering some text and
> selecting the appropriate option from a drop down menu, and then you
> press the submit form button.

You may be able just to send the same data as the form without having to actually read the form from 
the web site. If the form uses GET, you can see the data it sends in the URL bar of the browser 
after you submit the form. If the form uses POST, you will have to look at the actual form to see 
how the data is formatted.

Alternatively you can use ClientForm to fill out the actual form and submit it.
http://wwwsearch.sourceforge.net/ClientForm/

> 
> The webpage subsequently displays links to every entry it can find and
> when you click on the link you get the full info of that entry.

BeautifulSoup can help you pull the links out of the reply.
http://www.crummy.com/software/BeautifulSoup/

urllib2 (in the standard library) can retrieve the final web page, if you need to parse it use 
BeautifulSoup again.

Kent

> 
> I want to automate the whole process with a python script. I figured
> out that the csv module(s) would be best for storing the received
> information but what module should I use to access the website, fill
> out the form, enter each resulting link and retrieve the information?
> 
> Thanks for your time,
> Anderson
> _______________________________________________
> Tutor maillist  -  Tutor at python.org
> http://mail.python.org/mailman/listinfo/tutor
> 




More information about the Tutor mailing list