Regular Expression problem

cdecarlo cdecarlo at gmail.com
Thu Jul 13 18:50:37 EDT 2006


Hey,

I'm new with regex's as well but here is my idea. Since you don't know
which attribute will come first why don't structure your regex like
this

(first off, I'll assume that \s == ' ', actually now that I think of
it, isn't \s any whitespace character? anyways \s == ' ' for now)

'<link\s*((\s*attribute1\s*)|(\s*attribute2\s*)|(\s*attribute3\s*))+>'

I think that should just about do it.

Hope this helped,

Colin

John Blogger wrote:
> (I don't know if it is the right place. So if I am wrong, please point
> me the right direction.
> If this post is read by you masters, I'm honoured. If I am getting a
> mere response, I'm blessed!)
>
> Hi,
>
> I'm a newbie regular expression user. I use regex in my Python
> programs.  I have a strange
>
> (sometimes not strange, but please bear in mind;  I'm a newbie  ;)
> problem using regex. That I want
>
> a particular tag value of one of my HTML files.
>
> ie: I want only the value after 'href=' in the tag >>
>
> '<link href="mystylesheet.css" rel="stylesheet" type="text/css">'
>
> here it would be 'mystylesheet.css'. I used the following regex to get
> this value(I dont know if it
>
> is good).
>
> _"<link\s+href=["]?(.*?)["]?\s+rel=["]?stylesheet["]?\s+type=["]?text/css["]?>"_
> I thought I was doing fine until I got stuck by this tag >>
>
> <link rel="stylesheet" href="mystylesheet.css" type="text/css">  : same
> tag but with 'href=' part
>
> at a different place. I think you got the point!
>
> So What should I do to get the exact value(here the value after
> 'href=') in any case even if the
>
> tags are like these? >>
>
> <link rel="stylesheet" href="mystylesheet.css" type="text/css">
> -OR-
> <link href="mystylesheet.css" rel="stylesheet" type="text/css">
> -OR-
> <link type="text/css" href="mystylesheet.css" rel="stylesheet">




More information about the Python-list mailing list