Parsing html
Thomas Guettler
guettli at thomas-guettler.de
Fri Jul 9 09:02:26 EDT 2004
Am Thu, 08 Jul 2004 17:04:24 +0100 schrieb C Gillespie:
> Dear All,
>
> I have hopefully a very simple problem. I wish to parse an html page and
> extract everything between the <body> tags.
>
> E.g.
> <head>
> <body>
> <b>afsdf</b>
> </body>
> </head>
>
> Would give
> <body>
> <b>afsdf</b>
> </body>
>
> I've been playing about with htmllib with no successful. Any suggestions?
HTML can be broken in many ways. If you want
a solution which can read most of the HTML on the
web, you can use tidy and use XML as output.
XML can be handled much easier with SAX/DOM.
Regards,
Thomas
--
Thomas Güttler, http://www.thomas-guettler.de/
More information about the Python-list
mailing list