HTML parsing/scraping & python

Fuzzyman fuzzyman at gmail.com
Thu Dec 1 03:46:05 EST 2005


The standard library module for fetching HTML is urllib2.

The best module for scraping the HTML is BeautifulSoup.

There is a project called mechanize, built by John Lee on top of
urllib2 and other standard modules.

It will emulate a browsers behaviour - including history, cookies,
basic authentication, etc.

There are several modules for automated form filling - FormEncode being
one.

All the best,


Fuzzyman
http://www.voidspace.org.uk/python/index.shtml




More information about the Python-list mailing list