[Madison] Python to accept terms and condition form a website + search

Nicola Branzoli nbranzol at ssc.wisc.edu
Sun May 8 19:04:08 CEST 2011



Many thanks to Eric for his suggestion. 

I had found a way to to
solve this problem, by looking how to parse inputs to the java function
__doPostBack(). The solution I found is a little naive but it works and
uses urllib2, this link [1] was useful. The documentation on mechanize
is a little still obscure to me... 

Here is a new problem I am facing:


Page:
http://emma.msrb.org/MarketActivity/RecentARS.aspx?showPopup=true [2]

In this page the user can enter various search criteria. Suppose I want

Auction Date, From: 05/02/2011 To: 05/06/2011 
Here is the way I did it
(using again urllib2 because was the primising apprach given the
previous success)


url='http://emma.msrb.org/MarketActivity/RecentARS.aspx?showPopup=true
[3]' 

headers={'Cookie': 'DisclaimerCookie=yes;path=/'} 

values={'__EVENTTARGET':'','ctl00$mainContentArea$searchPopup$auctionDateEndTextBox':'05/06/2011','ctl00$mainContentArea$searchPopup$auctionDateBeginTextBox':'05/02/2011'}


dates_data = urllib.urlencode(values) 
req_cusips1=
urllib2.Request(url, dates_data, headers) # 
response_cusips =
urllib2.urlopen(req_cusips1) 
the_cusips_page = response_cusips.read()

cusips_page=BeautifulSoup(the_cusips_page)  
what I get back is the
same page, with the values substituted in the right place. The relevant
part of the page got is: 

Auction Date 

I have tried different values
for __EVENTTARGET such as
#ctl00$mainContentArea$marketActivitySearchLinks$serachARSLink 

#ctl00$mainContentArea$gridViewPagingUserControl$page1LinkButton 
but no
results, always only the page
http://emma.msrb.org/MarketActivity/RecentARS.aspx?showPopup=true [4]
and no results. 

A temptative way using mechanize is: 

import
mechanize 
from BeautifulSoup import BeautifulSoup, BeautifulStoneSoup


url='http://emma.msrb.org/MarketActivity/RecentARS.aspx?showPopup=true'#
 
headers={'Cookie': 'DisclaimerCookie=yes;path=/'} 
request =
mechanize.Request(url, headers=headers) 
response =
mechanize.urlopen(request) 
the_cusips_page = response.read()

cusips_page=BeautifulSoup(the_cusips_page) 
forms =
mechanize.ParseResponse(response, backwards_compat=False) 

but forms is
empty... 
Any comment would help, I find a little hard to follow the
examples in mechanize. 
Thanks 
n 

---
Nicola Branzoli
Ph.D. Candidate
- University of Wisconsin Madison
William H. Sewell Social Science
Building
1180 Observatory Drive
Madison, WI 53706-1393

On Thu, 05 May
2011 23:02:37 -0500, Eric Gierach  wrote: You probably have this figured
out by now, but mechanize's Browser object has a method, select_form,
which allows you to set the browser's focus on a particular form and
submit it with the "click" method. Use the select_form method's
predicate argument to pass a pointer to a function you define to find
the right form based on its content. It's easier than it sounds.


Example code:

from mechanize import Browser 
from urllib2 import
URLError 

# initialize browser, set user agent 
browser = Browser()

browser.addheaders = [('User-Agent', 'Mozilla/5.0 (Windows; U; Windows
NT 5.1; it; rv:1.8.1.11)')] 

# open the URL containing your TOS form

try: 
 browser.open('http://www.example.com/TOS.html [5]') 
except
URLError: 
 print "couldn't open the page" 

# if your bot got a valid
response 
if browser.viewing_html(): 
 # if your bot found the TOS form
and gave it focus 
 if browser.select_form(predicate=find_form): 
 #
optionally set other form fields 
 browser.form["YOUR_NAME"] = "Mr.
Spider" 
 browser.form["NUM_RECORDS"] = "35" 
 # browser.click generates
a Request object which you can pass to browser.open to submit the form.

 browser.open(browser.click()) 
 print "mission complete" 

def
find_form(form): 
 """ 
 The browser calls this function with each form
on the page. You need to find something unique 
 about the form you're
interested in and return true if the passed-in form has it. So, in this
example, 
 your TOS form has a  field. Let's search for that. 
 """ 

return "TOS_BTN" in form 

Here's the reference for mechanize forms:

http://wwwsearch.sourceforge.net/mechanize/forms.html [6] 
(it's a
mess) 

Eric  

On Tue, May 3, 2011 at 1:04 PM, Nicola Branzoli 
wrote:

Hey, I am writing a code in python to access public data online
(using 
BeautifulSoup). 
The task is relatively easy but the code does
not get to the page I want 
because I need to accept the terms and
condition of the website first 
 (by a standard 'Click the Accept'). 
I
need to tell python how to automatically accept the terms and 
condition
and proceed to the web address specified. I am new in 
pyhton, my guess
is that I have to use mechanize because cookielib is 
 not good for this
job. Am I right? What other resources can I use? Any 
link with an
example similar to my problem would be great...  

Thanks a lot! 

--

Nicola Branzoli
Ph.D. Candidate - University of Wisconsin
Madison
William H. Sewell Social Science Building
1180 Observatory
Drive
Madison, WI
53706-1393

_______________________________________________
 Madison
mailing list
Madison at python.org
[8]
http://mail.python.org/mailman/listinfo/madison
[9]



Links:
------
[1]
http://stackoverflow.com/questions/1418000/how-to-click-a-link-that-has-javascript-dopostback-in-href
[2]
http://emma.msrb.org/MarketActivity/RecentARS.aspx?showPopup=true
[3]
http://emma.msrb.org/MarketActivity/RecentARS.aspx?showPopup=true
[4]
http://emma.msrb.org/MarketActivity/RecentARS.aspx?showPopup=true
[5]
http://www.example.com/TOS.html
[6]
http://wwwsearch.sourceforge.net/mechanize/forms.html
[7]
mailto:nbranzol at ssc.wisc.edu
[8] mailto:Madison at python.org
[9]
http://mail.python.org/mailman/listinfo/madison
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/madison/attachments/20110508/1c52df8d/attachment.html>


More information about the Madison mailing list