HTML parsing/scraping & python

Thu Dec 1 03:46:05 EST 2005

The standard library module for fetching HTML is urllib2.

The best module for scraping the HTML is BeautifulSoup.

There is a project called mechanize, built by John Lee on top of
urllib2 and other standard modules.

It will emulate a browsers behaviour - including history, cookies,
basic authentication, etc.

There are several modules for automated form filling - FormEncode being
one.

All the best,

Fuzzyman
http://www.voidspace.org.uk/python/index.shtml