re pattern for matching JS/CSS

ina erinhouston at gmail.com
Fri Dec 15 11:45:52 EST 2006


i80and wrote:
> I'm working on a program to remove tags from a HTML document, leaving
> just the content, but I want to do it simply.  I've finished a system
> to remove simple tags, but I want all CSS and JS to be removed.  What
> re pattern could I use to do that?
>
> I've tried
> '<script[\S\s]*/script>'
> but that didn't work properly.  I'm fairly basic in my knowledge of
> Python, so I'm still trying to learn re.
> What pattern would work?

I use  re.compile("<script.*?</script>",re.DOTALL)
for scripts.  I strip this out first since my tag stripping re will
strip out script tags as well hope this was of help.




More information about the Python-list mailing list