Python simulate browser activity

Roy Smith roy at panix.com
Fri Mar 16 08:57:30 EDT 2012


In article 
<214c4c0c-f8ec-4030-946b-8becc8e1aa9c at ur9g2000pbc.googlegroups.com>,
 choi2k <rex.0510 at gmail.com> wrote:

> Hi, everyone
> 
> I am trying to write a small application using python but I am not
> sure whether it is possible to do so..
> The application aims to simulate user activity including visit a
> website and perform some interactive actions (click on the menu,
> submit a form, redirect to another pages...etc)
> I have found some libraries / plugins which aims to simulate browser
> activity but ... none of them support AJAX request and/or running
> javascript.
> 
> Here is one of the simple tasks which I would like to do using the
> application:
> 1. simulate the user activity, visit the website ("https://
> www.abc.com")
> 2. find out the target element by id ("submitBtn") and simulate mouse
> click on the item.
> 3. perform some javascript functions which invoked by click event from
> step 2. ( the function is bind to the item by jquery)
> 4. post ajax request using javascript and if the page has been
> redirected, load the new page content
> 
> Is it possible to do so?
> 
> Thank you in advance.

Depending on exactly what you're trying to test, this may or may not be 
possible.

#1 is easy.  Just get the URL with urllib (or, even better, Kenneth 
Reitz's  very cool requests library (http://docs.python-requests.org/).

#2 gets a little more interesting, but still not that big a deal.  Parse 
the HTML you get back with something like lxml (http://lxml.de/).  
Navigate the DOM with a CSSSelector to find the element you're looking 
for.  Use urllib/requests again to get the href.

#3 is where things get fun.  What, exactly, are you trying to test?  Are 
you trying to test that the js works, or are you really trying to test 
your server and you're just using the embedded js as a means to generate 
the next HTTP request?

If the former, then you really want to be exploring something like 
selenium (http://seleniumhq.org/).  If the latter, then skip all the js 
crap and just go a GET or POST (back to urllib/requests) on the 
appropriate url.

#4 falls into the same bucket as #3.

Once you have figured all this out, you will get a greater appreciation 
for the need to cleanly separate your presentation (HTML, CSS, 
Javascript) from your business logic and data layers.  If there is a 
clean interface between them, it becomes easy to test one in isolation 
from the other.  If not, then you end up playing games with screen 
scraping via lxml and selenium just to test your database queries.  
Which will quickly lead to you running screaming into the night.



More information about the Python-list mailing list