html + javascript automations = [mechanize + ?? ] or something else?

Benjamin Niemann pink at odahoda.de
Tue Jan 16 04:20:06 EST 2007


Hello,

John wrote:

> John wrote:
>> I have to write a spyder for a webpage that uses html + javascript. I
>> had it written using mechanize
>> but the authors of the webpage now use a lot of javascript. Mechanize
>> can no longer do the job.
>> Does anyone know how I could automate my spyder to understand
>> javascript? Is there a way
>> to control a browser like firefox from python itself? How about IE?
>> That way, we do not have
>> to go thru something like mechanize?
> 
> I am curious about the webbrowser module. I can open up firefox
> using webbrowser.open(), but can one control it? Say enter a
> login / passwd on a webpage? Send keystrokes to firefox?
> mouse clicks?

Not with the webbrowser module - it can only launch a browser.

On the website of mechanize you will also find DOMForm
<http://wwwsearch.sourceforge.net/DOMForm/>, which is a webscraper with
basic JS support (using the SpiderMonkey engine from the Mozilla project).
But note that DOMForm is in a early state and not developed anymore
(according to the site, never used it myself).

You could try to script IE (perhaps also FF, dunno..) using COM. This can be
done using the pywin32 module <https://sourceforge.net/projects/pywin32/>.
How this is done in detail is a windows issue. You may find help and
documentation in win specific group/mailing list, msdn, ... You can usually
translate the COM calls from VB, C#, ... quite directly to Python.


HTH

-- 
Benjamin Niemann
Email: pink at odahoda dot de
WWW: http://pink.odahoda.de/



More information about the Python-list mailing list