Trying to find regex for any script in an html source

28tommy 28tommy at gmail.com
Wed Dec 21 16:31:24 EST 2005


Hi,
I'm trying to find scripts in html source of a page retrieved from the
web.
I'm trying to use the following rule:

match = re.compile('<script [re.DOTALL]+ src=[re.DOTALL]+>')

I'm testing it on a page that includes the following source:

<script language="JavaScript1.2"
src="http://i.cnn.net/cnn/.element/ssi/js/1.3/mainVideoMod.js"
type="text/javascript"></script>

But I get - 'None' as my result.
Here's (in words) what I'm trying to do: '<script ' followed by any
type and a number of charecters, and then followed by ' src=' followed
by any type and a number of charecters, and then finished by '>'

What am I doing wrong?
Thanks.




More information about the Python-list mailing list