HTML DOM parser?

Sat Aug 2 09:34:36 EDT 2003

Paul Rubin <http://phr.cx@NOSPAM.invalid> writes:

> calfdog at yahoo.com writes:
> > Here is a quick example of using automation with IE
> > # This is a sample of automating IE using Python.
> 
> Thanks, I should have said I'm running under gnu/linux and I was
> hoping for a standalone solution (some of the ones suggested sound
> worth looking into).  Even connecting up Python to Mozilla sounds
> awfully heavyweight.

PyKDE is less hassle, I think.  It's certainly heavyweight, though.
Probably more lightweight still is HttpUnit on Jython.  I haven't used
either, but I have compiled PyKDE recently, and didn't run into
problems (but if you're unlucky, you may have to compile Qt, KDE, sip
and PyQt first!).

I seem to have got a basic JavaScript wrapper working now (I'm using
libjs from Mozilla's standalone spidermonkey distribution), bound 4DOM
to it, and extracted & executed the script from a web page.  Quite a
lot more to do, though (browser-like interface of some sort,
javascript: scheme URLs, implement window object, wiring up event
attributes to the JS interpreter, getting the DOM actually working
propertly, understanding what document.write does, trying to connect
the DOM to my Python HTML form and HTTP cookies interfaces...).

Anybody happen to know where JavaScript's document.some_form is
documented?  Official W3C DOM has document.forms, but real browser
DOMs apparently have forms directly on the document object.

John