Querying a complex website

7stud bbxx789_05ss at yahoo.com
Tue Feb 19 18:04:06 EST 2008


schweet1 wrote:
> Greetings,
>
> I am attempting to use python to submit a query to the following URL:
>
> https://ramps.uspto.gov/eram/patentMaintFees.do
>
> The page looks simple enough - it requires submitting a number into 2
> form boxes and then selecting from the pull down.
>
> However, my test scripts have been hung up, apparently due to the
> several buttons on the page having the same name.  Ideally, I would
> have the script use the "Get Bibligraphic Data" link.
>
> Any assistance would be appreciated.
>
> ~Jon

This is the section you are interested in:

-------------
<tr>
<td colspan=3><input type="submit" name="maintFeeAction"
value="Retrieve Fees to Pay"> </td>
</tr>

<tr>
<td colspan=3><input type="submit" name="maintFeeAction" value="Get
Bibliographic Data"> </td>
</tr>

<tr>
<td colspan=3><input type="submit" name="maintFeeAction" value="View
Payment Windows"> </td>
</tr>
<tr>
------------

1) When you click on a submit button on a web page, a request is sent
out for the web page listed in the action attribute of the <form> tag,
which in this case is:

<form name="mfInputForm" method="post" action="/eram/
getMaintFeesInfo.do;jsessionid=0000-MCoYNbJsaUCr2VfzZhKILX:11g0uepfb">

The url specified in the action attribute is a relative url.  The
current url in the address bar of your browser window is:

https://ramps.uspto.gov/eram/patentMaintFees.do

and if you compare that to the url in the action attribute of the
<form> tag:

---------
https://ramps.uspto.gov/eram/patentMaintFees.do

/eram/getMaintFeesInfo.do;jsessionid=0000-MCoYNbJsaUCr2VfzZhKILX:
11g0uepfb
---------

you can piece them together and get the absolute url:


https://ramps.uspto.gov/eram/getMaintFeesInfo.do;jsessionid=0000-MCoYNbJsaUCr2VfzZhKILX:11g0uepfb


2) When you click on a submit button, a request is sent to that url.
The request will contain all the information you entered into the form
as name/value pairs.  The name is whatever is specified in the name
attribute of a tag and the value is whatever is entered into the form.

Because the submit buttons in the form have name attributes,  the name
and value of the particular submit button that you click will be added
to the request.

3)  To programmatically mimic what happens in your browser when you
click on the submit button of a form, you need to send a request
directly to the url listed in the action attribute of the <form>.
Your request will contain the name/value pairs that would have been
sent to the server if you had actually filled out the form and clicked
on the 'Get Bibliographic Data' submit button.  The form contains
these input elements:

----
<input type="text" name="patentNum" maxlength="7" size="7" value="">

<input type="text" name="applicationNum" maxlength="8" size="8"
value="">
----

and the submit button you want to click on is this one:

<input type="submit" name="maintFeeAction" value="Get Bibliographic
Data">

So the name value pairs you need to include in your request are:

data = {
    'patentNum':'1234567',
    'applicationNum':'08123456',
    'maintFeeAction':'Get Bibliographic Data'
}


Therefore, try something like this:

import urllib

data = {
    'patentNum':'1234567',
    'applicationNum':'08123456',
    'maintFeeAction':'Get Bibliographic Data'
}

enc_data = urllib.urlencode(data)
url = 'https://ramps.uspto.gov/eram/
getMaintFeesInfo.do;jsessionid=0000-MCoYNbJsaUCr2VfzZhKILX:11g0uepfb'

f = urllib.urlopen(url, enc_data)

print f.read()
f.close()


If that doesn't work, you may need to deal with cookies that the
server requires in order to keep track of you as you navigate from
page to page.  In that case, please post a valid patent number and
application number, so that I can do some further tests.



More information about the Python-list mailing list