Web automation

Mike Meyer mwm at mired.org
Wed Nov 9 15:41:58 EST 2005


qwweeeit at yahoo.it writes:
> Hi Mike,
> thank you very much for your reply.
> I know that mine could be considered
>> a "very" silly way to define automation.
> but I'm not a purist nor a professional
> programmer.

Yes, but you still need to communicate with other people. Using words
to mean something other than what those people expect them to mean is
a recipe for trouble.

> Besides that, I know that case by case
> every problem can be solved and in more
> "right" way, also in very difficult environments
> (framed, Javascript heavy pages) ... but not by me!

The "right" way is what works for you. I'd call using a higher-level
approach the "easy" way - at least when compared to to simulating GUI
events!

> I must confess: before pressing manually 220 times a
> "Next" button and save the data sent by the
> html server (using simply cut/paste), I tried to use
> shell programming, DCOP etc., but in the end
> I reverted to the "by hand" method...
>
> Perhaps if I were an expert like you, I could have
> programmed a small script in a matter of minutes.

I can't say without having looked at your example whether or not
that's possible. I can say that it would probably take more than a few
minutes, having done similar things myself.

Automating web stuff is very fragile in any case. Minor changes in web
formatting can break the automation. Someone already pointed out that
the web isn't well-designed for automation.

> Only after a week I found the solution (twill) but
> I have discovered that also this solution obliges
> to consider every case and program the script
> accordingly (to not mentioning the need of
> disguising it as a browser).
> On the other end, the "cheating" method doesn't
> assure a 100% success, because the html servers can
> have a high degree of "cleverness".

So can your twill script.

> If I would cut out all automatic queries from my server,
> I would also time between them to see if the stream
> of queries ("apparently" coming from a browser) are
> indeed compatible with a browser operated by a
> human being...

So I'd add an automatic - and random - delay between each fetch. No
problem. Well, once I figured out what you were checking for, anyway.

> For all this reasons, reluctantly, I am going back to Windows
> and its macro languages.
>
> Unfortunately in Windows there are many more security problems...
>
> If you have a "general" solution for the X world...

Well, what you want is SMOP. The problem is, there are easier to use
solutions for almost every case you run into in the real world, so
there's little incentive for providing a "general" (i.e. - low-level,
as you can't force a high-level interface on apps) solution. Scripting
tools on Windows aren't generally as capable - or at least weren't
until relatively recently - so there's more incentive for a low-level
solution to be developed. You happen to be hitting one of the corner
cases where the high-level ltools on Unix just aren't up to the job.

      <mike
-- 
Mike Meyer <mwm at mired.org>			http://www.mired.org/home/mwm/
Independent WWW/Perforce/FreeBSD/Unix consultant, email for more information.



More information about the Python-list mailing list