[Tutor] formClient or how to login to a web site?

Kent Johnson kent_johnson at skillsoft.com
Thu Nov 4 18:01:05 CET 2004


Looking at the logon page you cite, the Valider button is calling a 
Javascript function Identification().

In the ClientForm General FAQ I found this question:
Embedded script is messing up my web-scraping. What do I do?

One suggested option is this:
  * Simply figure out what the embedded script is doing and emulate it in 
your Python code: for example, by manually adding cookies to your CookieJar 
instance, calling methods on HTMLForms, calling urlopen, etc.

Viewing source of the web page, that function is not there, but there are a 
few external Javascript pages referenced. I thought that identmig.js looked 
promising so I loaded it by browsing to 
https://interactif.creditlyonnais.fr/js/identmig.js

Most of the work of Identification() is done by the function ValidLogin(). 
This just looks at the form data and sets the submit url based on the data.

So I have two suggestions:
- Stop using ClientForm. Just figure out from the HTML form and the 
Javascript what data needs to be posted and do it directly using 
urllib2.urlopen() with a data argument.
- Emulate the Javascript and set the url in the ClientForm before calling 
form.click()

Kent

At 10:50 PM 11/3/2004 -0500, kbond wrote:
>Hello all of you,
>
>I am starting a project that has a simple objectif logon into a web site 
>(https://interactif.creditlyonnais.fr/), access a protected area and then 
>save a web page on my computer.
>The idea is to be able to do this daily and then process the files 
>collection to do some statistic.
>
>After googling, I found out that there is a python module called: 
>clientForm (http://wwwsearch.sourceforge.net/ClientForm/)
>here it is what the author wrote about it:"ClientForm is a Python module 
>for handling HTML forms on the client side, useful for parsing HTML forms, 
>filling them in and returning the completed forms to the server. "
>
>The problem is that I am not able to validate my form and send it back to 
>the server. My goal is then to be able to get the following web page.
>
>you will find below my code :
>+++++++++++++++++++++++++++
>import urllib2
>from urllib2 import urlopen
>from ClientForm import ParseResponse
>forms = ParseResponse(urlopen("https://interactif.creditlyonnais.fr"))
>form = forms[0]
>print form
>
>form["agenceId"]="XXXXX"
>form["compteId"]="XXXXXX"
>form["CodeId"]="XXXXX"
>
>print form
>request= form.click() ##My guess is that I am missing something there
>print request
>response= urlopen(request)
>print response.geturl() ##Unfortunatly I am getting there the original URL.
>print response.read()
>response.close()
>+++++++++++++++++++++++++++
>
>Any help with clientForm or another solution would be more than welcome.
>
>Thank you for your help
>_______________________________________________
>Tutor maillist  -  Tutor at python.org
>http://mail.python.org/mailman/listinfo/tutor



More information about the Tutor mailing list