Stripping scripts from HTML with regular expressions

Nikita the Spider NikitaTheSpider at gmail.com
Thu Apr 10 11:12:36 EDT 2008


In article <mailman.161.1207771905.17997.python-list at python.org>,
 "Reedick, Andrew" <jr9445 at ATT.COM> wrote:

> > -----Original Message-----
> > From: python-list-bounces+jr9445=att.com at python.org [mailto:python-
> > list-bounces+jr9445=att.com at python.org] On Behalf Of Michel Bouwmans
> > Sent: Wednesday, April 09, 2008 3:38 PM
> > To: python-list at python.org
> > Subject: Stripping scripts from HTML with regular expressions
> > 
> > Hey everyone,
> > 
> > I'm trying to strip all script-blocks from a HTML-file using regex.
> > 
> 
> [Insert obligatory comment about using a html specific parser
> (HTMLParser) instead of regexes.]

Yah, seconded. To the OP - use BeautifulSoup or HtmlData unless you like 
to reinvent wheels.

-- 
Philip
http://NikitaTheSpider.com/
Whole-site HTML validation, link checking and more



More information about the Python-list mailing list