Web automation

Mike Meyer mwm at mired.org
Tue Nov 8 17:23:41 EST 2005


"Paul Boddie" <paul at boddie.org.uk> writes:
> Mike Meyer wrote:
>> "Paul Boddie" <p... at boddie.org.uk> writes:
>> > The problem on non-Windows systems is the lack of a common (or
>> > enforced) technology for exposing application object models
>> OS X has AppleScript. VM/CMS has Rexx. The Amiga had ARexx when MS was
>> still peddling DOS. Plan 9 has files.
> I knew I should have written "UNIX systems" or "non-Windows but still
> mainstream systems". ;-)

Except OS X is Unix, non-Windows, and still mainstream. Maybe
"non-GUI-intensive systems"?

>> I don't think any of them are "enforced" - then again, I don't think anything enforces
>> exporting objects from Windows applications, either.
> No, but COM is the obvious choice for doing so on Windows. Combine that
> with the component developer mindset and it's likely that some kind of
> object model will be exposed by an application.

I think I pointed out that the real thing all those system have is
they come bundled with a way of exporting objects from
applications. Generally, one that's better than embedding an
interpreter in the application, too. There are patches for Linux that
provide the system functionality needed for Plan 9's filesystem export
facilities. If those ever go mainstream, I'll seriously consider
switching to Linux.

> Still, I wouldn't say that automation is necessarily "ass-backwards":
> sometimes you want the additional baggage that the browser can give you
> - witness the occasional comp.lang.python thread about working with
> JavaScript-laden pages - and it's not necessarily automation involving
> the activation of coincidental user interface components (find the
> "Register Later" button in the "Register Now?" pop-up dialogue and
> click on it") that's involved here either.

You know, I recently automated a web task that invovled framed,
JavaScript heavy pages. The pages had a link to a "no javascript"
version, but that just returned a server error. Pretty hopeless,
right? Turns out my script doesn't need any of that. I drilled down
through the Frames to access the pages with real forms on them, and
examineed the javascript source to figure out what it was doing - then
did the appropriate POSTs by hand, and it all works quite nicely.

> Yes, the very architecture of the Web should have made automation tasks
> a lot more open and convenient, but sometimes there's a need for a
> "complete" browser to get at the data.

Well, there are two options to that. One is to find a modern browser
with a good scripting interface. The other is to provide a scripting
facility with the capabilities of a good browser - which mostly means
JavaScript. CSS is a non-issue, and plugins aren't frequent enough to
be a real problem. Especially since you may not be able to run the
proprietary plugins on your system anyway, so even a full-blown
browser won't help :-(.

For JavaScript - there are standalone implementations available. If I
ever run into a case where I actually have to run JavaScript to deal
with a web automation task, I'll check them out. That no one has
wrapped one for use in Python scripts tends to indicate that the
JavaScript problem isn't as bad as it appears at first.

           <mike
-- 
Mike Meyer <mwm at mired.org>			http://www.mired.org/home/mwm/
Independent WWW/Perforce/FreeBSD/Unix consultant, email for more information.



More information about the Python-list mailing list