how to get rid of html tags
Ian Bicking
ianb at colorstudy.com
Thu Oct 3 00:33:27 EDT 2002
The easy answer:
page = re.sub(r'<.*?>', '', page)
There may be more Correct answers, though. (Some HTML has unquoted <>
characters, which browsers accept even though it's super annoying to
parse -- but I don't know that htmllib parses improper HTML either)
On Wed, 2002-10-02 at 20:04, koko wrote:
> I am trying to retrieve a web page.
> But I only want to keep the content of the webpage without the html tags.
> How can I parse the webpage to get rid of the tags?
>
>
> --
> http://mail.python.org/mailman/listinfo/python-list
More information about the Python-list
mailing list