re pattern for matching JS/CSS

Tim Chase python.list at tim.thechases.com
Fri Dec 15 11:52:40 EST 2006


>> I've tried
>> '<script[\S\s]*/script>'
>> but that didn't work properly.  I'm fairly basic in my knowledge of
>> Python, so I'm still trying to learn re.
>> What pattern would work?
> 
> I use  re.compile("<script.*?</script>",re.DOTALL)
> for scripts.  I strip this out first since my tag stripping re will
> strip out script tags as well hope this was of help.

This won't catch various alterations of

	<
	script
	>
	doEvil()
	<
	/
	script
	>

which is valid html/xhtml.

For less valid html, but still attemptable, one might find 
something like

	<scrip<script>hah</script>t>doEvil()</script>

which, if you nuke your pattern, leaves the valid but unwanted

	<script>doEvil()</script>

I'd propose that it's better to use something such as 
BeautifulSoup that actually parses the HTML, and then skim 
through it whitelisting the tags you plan to allow, and skipping 
the emission of any tags that don't make the whitelist.

-tkc







More information about the Python-list mailing list