Regular Expression problem

John Blogger blenderhack at gmail.com
Thu Jul 13 18:37:26 EDT 2006


(I don't know if it is the right place. So if I am wrong, please point
me the right direction.
If this post is read by you masters, I'm honoured. If I am getting a
mere response, I'm blessed!)

Hi,

I'm a newbie regular expression user. I use regex in my Python
programs.  I have a strange

(sometimes not strange, but please bear in mind;  I'm a newbie  ;)
problem using regex. That I want

a particular tag value of one of my HTML files.

ie: I want only the value after 'href=' in the tag >>

'<link href="mystylesheet.css" rel="stylesheet" type="text/css">'

here it would be 'mystylesheet.css'. I used the following regex to get
this value(I dont know if it

is good).

_"<link\s+href=["]?(.*?)["]?\s+rel=["]?stylesheet["]?\s+type=["]?text/css["]?>"_
I thought I was doing fine until I got stuck by this tag >>

<link rel="stylesheet" href="mystylesheet.css" type="text/css">  : same
tag but with 'href=' part

at a different place. I think you got the point!

So What should I do to get the exact value(here the value after
'href=') in any case even if the

tags are like these? >>

<link rel="stylesheet" href="mystylesheet.css" type="text/css">
-OR-
<link href="mystylesheet.css" rel="stylesheet" type="text/css">
-OR-
<link type="text/css" href="mystylesheet.css" rel="stylesheet">




More information about the Python-list mailing list