[newbie] Is Python what I'm looking for?

Gerhard Häring gh_pythonlist at gmx.de
Fri May 24 19:29:18 EDT 2002


* Giulio Cespuglio <giulio.agostini.remove.this at libero.it> [2002-05-24 22:05 +0000]:
> Hi there,
> 
> My aim is to automatically get specific pieces of information from a
> website, simulating the behaviour of a user filling in HTML forms and
> clicking buttons (a web robot?), then embed them in my HTML page.
> In other words, the pages I need to access are not accessible from a
> standard URL.
> The other part of the problem is of course parsing the resulting HTML
> and extracting the pieces of info I need.
> 
> Does Python provide libraries that could help me?

Yes, chances are that everything you need for this is already available
in Python's standard library:

- urllib, urllib2, Cookie for the web robot part
- htmllib, sgmllib for the HTML parsing part

If your HTML input is XHTML, and thus valid XML (unfortunately, that's
unlikely), you could also use Python's XML libraries for parsing HTML.

> give me some keywords/pointers?

An example for submitting a form with urllib is included in the Python
documentation:

http://www.python.org/doc/current/lib/node307.html

> I'm completely new to Python.  I would of course set up my web server
> under windows (Apache?) and the necessary plugin.

Apache is easy to set up. Add this line at the end of Apache's
httpd.conf to enable Python CGI support:

ScriptInterpreterSource Registry

Then you can put Python CGI scripts ending in .py in Apache's cgi-bin
directory.

> Can you think of a better way of doing this? Another scripting
> language perhaps?

I've heard that Perl has a nice module for writing web robots, but don't
remember how it's called. I'd not recommend to learn Perl, though. But
maybe your brain is wired differently than mine and you'll like it.

I'm pretty confident that you'll like Python better :-)

Gerhard
-- 
This sig powered by Python!
Außentemperatur in München: 12.1 °C      Wind: 3.4 m/s





More information about the Python-list mailing list