re pattern for matching JS/CSS

i80and i80and at gmail.com
Fri Dec 15 11:56:06 EST 2006


I'm working on a program to remove tags from a HTML document, leaving
just the content, but I want to do it simply.  I've finished a system
to remove simple tags, but I want all CSS and JS to be removed.  What
re pattern could I use to do that?

I've tried
'<script[\S\s]*/script>'
but that didn't work properly.  I'm fairly basic in my knowledge of
Python, so I'm still trying to learn re.
What pattern would work?




More information about the Python-list mailing list