HTML Parsing and Indexing

Stefan Behnel stefan.behnel-n05pAM at web.de
Tue Nov 14 02:50:57 EST 2006


mailtogops at gmail.com wrote:
>     I am involved in one project which tends to collect news
> information published on selected, known web sites inthe format of
> HTML, RSS, etc and sortlist them and create a bookmark on our website
> for the news content(we will use django for web development). Currently
> this project is under heavy development.
> 
> I need a help on HTML parser.

lxml includes an HTML parser which can parse straight from URLs.

http://codespeak.net/lxml/
http://cheeseshop.python.org/pypi/lxml

Stefan



More information about the Python-list mailing list