[Web-SIG] Python version of WWW::Mechanize

John J Lee jjl at pobox.com
Sun Nov 30 10:17:41 EST 2003


On Sun, 30 Nov 2003, Stuart Langridge wrote:
[...]
> I've put together a first cut of something that works like
> WWW::Mechanize at http://www.kryogenix.org/days/2003/11/30/pybrowser.
> Obviously it'll need a little more work on it, but it seems to work OK
> initially. Do let me know if it doesn't seem to work!

Good, some code!

Some comments:

Is this aimed at the standard library?  xml.dom.ext.reader.HtmlLib?
Unless I'm confused about it (quite likely actually, thanks to PyXML
insisting on fiddling with the xml package instead of creating its own),
that's not part of the standard library.  Is PyXML going to be by 2.4,
perhaps?  Even then, would 4DOM go in?  The original maintainers have
dropped it, it's slow, and it's not up-to-date with the DOM level 2 spec.
Personally, if I were going to depend on DOM outside the standard library,
I'd want a forms interface that was higher level -- but I've already done
that in DOMForm (though no browser class yet), and I guess it's a matter
of taste whether you like a higher-level forms interface.  What do other
people think?

Why isn't it a subclass of urllib.OpenerDirector (or, better, from
something like my (untested sketch of a) UserAgent in
http://wwwsearch.sf.net/bits/ua.py)? Certainly the interface of
OpenerDirector needs to be exposed by Browser (appropriately overridden).
I see no reason why it shouldn't be a subclass, in fact: composition seems
like needless complication.  WWW::Mechanize is a subclass of
LWP::UserAgent, and the author doesn't seem to have run into any problems.
And why is the method analogous to OpenerDirector.open() named .get(),
when the URL might be POST, or even some completely different scheme
(ftp:, file:...)?

It uses urlopen, which means Browser state (eg. cookies) is global.  This
problem goes away if you subclass from OpenerDirector.

No multipart/form-data encoding?

I think there has to be some way of (optionally) linking up any browser
class to tidylib.

Any tests?

No .forward() / .backward() methods?

I think it's useful to have a separate nr argument for follow_link so you
can do (as in WWW::Mechanize):

  browser.follow_link("download", nr=3)


John



More information about the Web-SIG mailing list