Makin search on the other site and getting data and writing in xml

Paul Boddie paul at boddie.org.uk
Wed Sep 27 12:45:59 EDT 2006


George Sakkis wrote:
> altemurbugra at gmail.com wrote:
>
> > I dont mean google
> > i dont mean onelook.com
> >
> > these are only examples
> >
> > i hop eyou understand what i mean
>
> Apparently, *you* don't understand what they're trying to tell you. It
> roughly boils down to the following:

If we just step back from the brink for a moment and give the
questioner the benefit of the doubt - that the exercise merely involves
automating some kind of interactions that would otherwise require lots
of manual messing around piloting a browser, rather than performing
some kind of bulk "suck down" of an entire site's information - then it
is obviously possible to use the following techniques:

  * Use a well-known mirroring or archiving tool such as wget.
  * Use various testing tools, some of which are written in Python.
  * Use urllib, urllib2 or httplib plus an HTML or XML parser in your
    own program.
  * Automate a Web browser using some off-the-shelf program.
  * Use various automation mechanisms provided by your environment
    (eg. COM, DCOP), possibly with Python libraries (eg. PAMIE [1],
    KPart Plugins [2]).

Various sites forbid wget and friends as a rule, understandably, but
there are sometimes reasons why you might want to use various tools to
automate a procedure involving lots of data which would waste a huge
amount of time if done manually. Perhaps you might have mail residing
in a Webmail system which can't be extracted via any process other than
reading all the messages in a browser, for example, or perhaps your
favourite Internet applications don't provide decent shortcuts to the
information you need, instead believing that it's all about the
"experience": surfing around watching all the animated adverts.
Automation and related technologies can legitimately help users regain
control of their Internet-resident data and make better use of the
services around it.

Paul

[1] http://pamie.sourceforge.net/
[2] http://www.boddie.org.uk/python/kpartplugins.html




More information about the Python-list mailing list