Mechanoid Web Browser - Recording Capability

John J. Lee jjl at pobox.com
Sat Sep 16 20:15:44 EDT 2006


"Seymour" <seymour.morris at gmail.com> writes:

> I am trying to find a way to sign onto my Wall Street Journal account
> (http://online.wsj.com/public/us) and automatically download various
> financial pages on stocks and mutual funds that I am interested in
> tracking.  I have a subscription to this site and am trying to figure
[...]
> My questions are:
> 1. Is there an easier way to grab these pages from a password protected
> site, or is the use of Mechanoid a reasonable approach?

This is the first time I heard of anybody using mechanoid.  As the
author of mechanize, of which mechnoid is a fork, I was always in the
dark about why the author decided to fork it (he hasn't emailed
me...).

I don't know if there's any activity on the mechanoid project, but I'm
certainly still working on mechanize, and there's an active mailing list:

http://wwwsearch.sourceforge.net/

https://lists.sourceforge.net/lists/listinfo/wwwsearch-general


> 2. Is there an easy way of recording a web surfing session in Firefox
> to see what the browser sends to the site?  I am thinking that this
> might help me better understand the Mechanoid commands, and more easily
> program it.  I do a fair amount of VBA Programming in Microsoft Excel
> and have always found the Macro Recording feature a very useful
> starting point which has greatly helped me get up to speed.

With Firefox, you can use the Livehttpheaders extension:

http://livehttpheaders.mozdev.org/


The mechanize docs explain how to turn on display of HTTP headers that
it sends.


Going further, certainly there's at least one HTTP-based recorder for
twill, which actually watches your browser traffic and generates twill
code for you (twill is a simple language for functional testing and
scraping built on top of mechanize):

http://twill.idyll.org/

http://darcs.idyll.org/%7Et/projects/scotch/doc/


That's not an entirely reliable process, but some people might find it
helpful.

I think there may be one for zope.testbrowser too (or ZopeTestBrowser
(sp?), the standalone version that works without Zope) -- I'm not
sure.  (zope.testbrowser is also built on mechanize.)  Despite the
name, I'm told this can be used for scraping as well as testing.

I would imagine that it would be fairly easy to modify or extend
Selenium IDE to emit mechanize or twill or zope.testbrowser (etc.)
code (perhaps without any coding, I used too many Firefox Selenium
plugins and now forget which had which features).  Personally I would
avoid using Selenium itself to actually automate tasks, though, since
unlike mechanize &c., Selenium drags in an entire browser, which
brings with it some inflexibility (though not as bad as in the past).
It does have advantages though: most obviously, it knows JavaScript.


John



More information about the Python-list mailing list