[Web-SIG] Extracting web data
Lennart Regebro
regebro at gmail.com
Tue Feb 22 08:38:10 CET 2011
On Tue, Feb 22, 2011 at 01:52, Aaron Watters <arw1961 at yahoo.com> wrote:
> BeautifulSoup is the standard response.
> I think lxml will not work very well unless the
> html is extremely nicely formatted, but I could
> be wrong.
>
lxml handles broken HTML pretty well.
Tere are Windows binaries here: http://pypi.python.org/pypi/lxml/2.2.8
//Lennart
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/web-sig/attachments/20110222/3b612997/attachment.html>
More information about the Web-SIG
mailing list