[Web-SIG] Extracting web data

Lennart Regebro regebro at gmail.com
Tue Feb 22 08:38:10 CET 2011


On Tue, Feb 22, 2011 at 01:52, Aaron Watters <arw1961 at yahoo.com> wrote:

> BeautifulSoup is the standard response.
> I think lxml will not work very well unless the
> html is extremely nicely formatted, but I could
> be wrong.
>

lxml handles broken HTML pretty well.

Tere are Windows binaries here: http://pypi.python.org/pypi/lxml/2.2.8

//Lennart
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/web-sig/attachments/20110222/3b612997/attachment.html>


More information about the Web-SIG mailing list