HTML Parsing

Dan Stromberg dstromberglists at gmail.com
Sat Jun 28 22:38:22 EDT 2008


On Sat, 28 Jun 2008 19:03:39 -0700, disappearedng wrote:

> Hi everyone
> I am trying to build my own web crawler for an experiement and I don't
> know how to access HTTP protocol with python.
> 
> Also, Are there any Opensource Parsing engine for HTML documents
> available in Python too? That would be great.

Check out BeautifulSoup.  I don't recall what license it uses, but the 
source is available, and it deals well with not-necessarily-beautiful-
inside HTML.




More information about the Python-list mailing list