Querying a complex website

schweet1 jon.kappes at gmail.com
Wed Feb 20 15:26:45 EST 2008


On Feb 19, 4:04 pm, 7stud <bbxx789_0... at yahoo.com> wrote:
> schweet1 wrote:
> > Greetings,
>
> > I am attempting to use python to submit a query to the following URL:
>
> >https://ramps.uspto.gov/eram/patentMaintFees.do
>
> > The page looks simple enough - it requires submitting a number into 2
> > form boxes and then selecting from the pull down.
>
> > However, my test scripts have been hung up, apparently due to the
> > several buttons on the page having the same name.  Ideally, I would
> > have the script use the "Get Bibligraphic Data" link.
>
> > Any assistance would be appreciated.
>
> > ~Jon
>
> This is the section you are interested in:
>
> -------------
> <tr>
> <td colspan=3><input type="submit" name="maintFeeAction"
> value="Retrieve Fees to Pay"> </td>
> </tr>
>
> <tr>
> <td colspan=3><input type="submit" name="maintFeeAction" value="Get
> Bibliographic Data"> </td>
> </tr>
>
> <tr>
> <td colspan=3><input type="submit" name="maintFeeAction" value="View
> Payment Windows"> </td>
> </tr>
> <tr>
> ------------
>
> 1) When you click on a submit button on a web page, a request is sent
> out for the web page listed in the action attribute of the <form> tag,
> which in this case is:
>
> <form name="mfInputForm" method="post" action="/eram/
> getMaintFeesInfo.do;jsessionid=0000-MCoYNbJsaUCr2VfzZhKILX:11g0uepfb">
>
> The url specified in the action attribute is a relative url.  The
> current url in the address bar of your browser window is:
>
> https://ramps.uspto.gov/eram/patentMaintFees.do
>
> and if you compare that to the url in the action attribute of the
> <form> tag:
>
> ---------https://ramps.uspto.gov/eram/patentMaintFees.do
>
> /eram/getMaintFeesInfo.do;jsessionid=0000-MCoYNbJsaUCr2VfzZhKILX:
> 11g0uepfb
> ---------
>
> you can piece them together and get the absolute url:
>
> https://ramps.uspto.gov/eram/getMaintFeesInfo.do;jsessionid=0000-MCoY...
>
> 2) When you click on a submit button, a request is sent to that url.
> The request will contain all the information you entered into the form
> as name/value pairs.  The name is whatever is specified in the name
> attribute of a tag and the value is whatever is entered into the form.
>
> Because the submit buttons in the form have name attributes,  the name
> and value of the particular submit button that you click will be added
> to the request.
>
> 3)  To programmatically mimic what happens in your browser when you
> click on the submit button of a form, you need to send a request
> directly to the url listed in the action attribute of the <form>.
> Your request will contain the name/value pairs that would have been
> sent to the server if you had actually filled out the form and clicked
> on the 'Get Bibliographic Data' submit button.  The form contains
> these input elements:
>
> ----
> <input type="text" name="patentNum" maxlength="7" size="7" value="">
>
> <input type="text" name="applicationNum" maxlength="8" size="8"
> value="">
> ----
>
> and the submit button you want to click on is this one:
>
> <input type="submit" name="maintFeeAction" value="Get Bibliographic
> Data">
>
> So the name value pairs you need to include in your request are:
>
> data = {
>     'patentNum':'1234567',
>     'applicationNum':'08123456',
>     'maintFeeAction':'Get Bibliographic Data'
>
> }
>
> Therefore, try something like this:
>
> import urllib
>
> data = {
>     'patentNum':'1234567',
>     'applicationNum':'08123456',
>     'maintFeeAction':'Get Bibliographic Data'
>
> }
>
> enc_data = urllib.urlencode(data)
> url = 'https://ramps.uspto.gov/eram/
> getMaintFeesInfo.do;jsessionid=0000-MCoYNbJsaUCr2VfzZhKILX:11g0uepfb'
>
> f = urllib.urlopen(url, enc_data)
>
> print f.read()
> f.close()
>
> If that doesn't work, you may need to deal with cookies that the
> server requires in order to keep track of you as you navigate from
> page to page.  In that case, please post a valid patent number and
> application number, so that I can do some further tests.- Hide quoted text -
>
> - Show quoted text -

Thanks all - I think there are cookie issues - here's an example data
pair to play with: 6,725,879 (10/102,919).  I'll post some of the code
i've tried asap.



More information about the Python-list mailing list