[Madison] Python to accept terms and condition form a website + search
Nicola Branzoli
nbranzol at ssc.wisc.edu
Sun May 8 19:04:08 CEST 2011
Many thanks to Eric for his suggestion.
I had found a way to to
solve this problem, by looking how to parse inputs to the java function
__doPostBack(). The solution I found is a little naive but it works and
uses urllib2, this link [1] was useful. The documentation on mechanize
is a little still obscure to me...
Here is a new problem I am facing:
Page:
http://emma.msrb.org/MarketActivity/RecentARS.aspx?showPopup=true [2]
In this page the user can enter various search criteria. Suppose I want
Auction Date, From: 05/02/2011 To: 05/06/2011
Here is the way I did it
(using again urllib2 because was the primising apprach given the
previous success)
url='http://emma.msrb.org/MarketActivity/RecentARS.aspx?showPopup=true
[3]'
headers={'Cookie': 'DisclaimerCookie=yes;path=/'}
values={'__EVENTTARGET':'','ctl00$mainContentArea$searchPopup$auctionDateEndTextBox':'05/06/2011','ctl00$mainContentArea$searchPopup$auctionDateBeginTextBox':'05/02/2011'}
dates_data = urllib.urlencode(values)
req_cusips1=
urllib2.Request(url, dates_data, headers) #
response_cusips =
urllib2.urlopen(req_cusips1)
the_cusips_page = response_cusips.read()
cusips_page=BeautifulSoup(the_cusips_page)
what I get back is the
same page, with the values substituted in the right place. The relevant
part of the page got is:
Auction Date
I have tried different values
for __EVENTTARGET such as
#ctl00$mainContentArea$marketActivitySearchLinks$serachARSLink
#ctl00$mainContentArea$gridViewPagingUserControl$page1LinkButton
but no
results, always only the page
http://emma.msrb.org/MarketActivity/RecentARS.aspx?showPopup=true [4]
and no results.
A temptative way using mechanize is:
import
mechanize
from BeautifulSoup import BeautifulSoup, BeautifulStoneSoup
url='http://emma.msrb.org/MarketActivity/RecentARS.aspx?showPopup=true'#
headers={'Cookie': 'DisclaimerCookie=yes;path=/'}
request =
mechanize.Request(url, headers=headers)
response =
mechanize.urlopen(request)
the_cusips_page = response.read()
cusips_page=BeautifulSoup(the_cusips_page)
forms =
mechanize.ParseResponse(response, backwards_compat=False)
but forms is
empty...
Any comment would help, I find a little hard to follow the
examples in mechanize.
Thanks
n
---
Nicola Branzoli
Ph.D. Candidate
- University of Wisconsin Madison
William H. Sewell Social Science
Building
1180 Observatory Drive
Madison, WI 53706-1393
On Thu, 05 May
2011 23:02:37 -0500, Eric Gierach wrote: You probably have this figured
out by now, but mechanize's Browser object has a method, select_form,
which allows you to set the browser's focus on a particular form and
submit it with the "click" method. Use the select_form method's
predicate argument to pass a pointer to a function you define to find
the right form based on its content. It's easier than it sounds.
Example code:
from mechanize import Browser
from urllib2 import
URLError
# initialize browser, set user agent
browser = Browser()
browser.addheaders = [('User-Agent', 'Mozilla/5.0 (Windows; U; Windows
NT 5.1; it; rv:1.8.1.11)')]
# open the URL containing your TOS form
try:
browser.open('http://www.example.com/TOS.html [5]')
except
URLError:
print "couldn't open the page"
# if your bot got a valid
response
if browser.viewing_html():
# if your bot found the TOS form
and gave it focus
if browser.select_form(predicate=find_form):
#
optionally set other form fields
browser.form["YOUR_NAME"] = "Mr.
Spider"
browser.form["NUM_RECORDS"] = "35"
# browser.click generates
a Request object which you can pass to browser.open to submit the form.
browser.open(browser.click())
print "mission complete"
def
find_form(form):
"""
The browser calls this function with each form
on the page. You need to find something unique
about the form you're
interested in and return true if the passed-in form has it. So, in this
example,
your TOS form has a field. Let's search for that.
"""
return "TOS_BTN" in form
Here's the reference for mechanize forms:
http://wwwsearch.sourceforge.net/mechanize/forms.html [6]
(it's a
mess)
Eric
On Tue, May 3, 2011 at 1:04 PM, Nicola Branzoli
wrote:
Hey, I am writing a code in python to access public data online
(using
BeautifulSoup).
The task is relatively easy but the code does
not get to the page I want
because I need to accept the terms and
condition of the website first
(by a standard 'Click the Accept').
I
need to tell python how to automatically accept the terms and
condition
and proceed to the web address specified. I am new in
pyhton, my guess
is that I have to use mechanize because cookielib is
not good for this
job. Am I right? What other resources can I use? Any
link with an
example similar to my problem would be great...
Thanks a lot!
--
Nicola Branzoli
Ph.D. Candidate - University of Wisconsin
Madison
William H. Sewell Social Science Building
1180 Observatory
Drive
Madison, WI
53706-1393
_______________________________________________
Madison
mailing list
Madison at python.org
[8]
http://mail.python.org/mailman/listinfo/madison
[9]
Links:
------
[1]
http://stackoverflow.com/questions/1418000/how-to-click-a-link-that-has-javascript-dopostback-in-href
[2]
http://emma.msrb.org/MarketActivity/RecentARS.aspx?showPopup=true
[3]
http://emma.msrb.org/MarketActivity/RecentARS.aspx?showPopup=true
[4]
http://emma.msrb.org/MarketActivity/RecentARS.aspx?showPopup=true
[5]
http://www.example.com/TOS.html
[6]
http://wwwsearch.sourceforge.net/mechanize/forms.html
[7]
mailto:nbranzol at ssc.wisc.edu
[8] mailto:Madison at python.org
[9]
http://mail.python.org/mailman/listinfo/madison
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/madison/attachments/20110508/1c52df8d/attachment.html>
More information about the Madison
mailing list