Regular Expression problem

Barry barrynyc at gmail.com
Sun Jul 16 06:52:56 EDT 2006


On 13 Jul 2006 23:12:05 -0700, Paul McGuire <ptmcg at austin.rr.com> wrote:
>
> Pyparsing is also good for recognizing basic HTML tags and their
> attributes, regardless of the order of the attributes.
>
> -- Paul
>
> testText = """sldkjflsa;faj
>
> <link href="mystylesheet.css" rel="stylesheet" type="text/css">
>
> here it would be 'mystylesheet.css'. I used the following regex to get
> this value(I dont know if it
>
> I thought I was doing fine until I got stuck by this tag >>
>
> <link rel="stylesheet" href="mystylesheet.css" type="text/css">  : same
>
> tag but with 'href=' part
>
> tags are like these? >>
>
> <link rel="stylesheet" href="mystylesheet.css" type="text/css">
> -OR-
> <link href="mystylesheet.css" rel="stylesheet" type="text/css">
> -OR-
> <link type="text/css" href="mystylesheet.css" rel="stylesheet">
>
> """
> from pyparsing import makeHTMLTags,line
>
> linkTag = makeHTMLTags("link")[0]
> for toks,s,e in linkTag.scanString(testText):
>     print toks.href
>     print line(s,testText)
>     print
>
> Prints out:
>
> mystylesheet.css
> <link href="mystylesheet.css" rel="stylesheet" type="text/css">
>
> mystylesheet.css
> <link rel="stylesheet" href="mystylesheet.css" type="text/css">  : same
>
>
> mystylesheet.css
> <link rel="stylesheet" href="mystylesheet.css" type="text/css">
>
> mystylesheet.css
> <link href="mystylesheet.css" rel="stylesheet" type="text/css">
>
> mystylesheet.css
> <link type="text/css" href="mystylesheet.css" rel="stylesheet">


Less is more:

pat = re.compile(r'href="([^"]+)')
pat.search(your_link)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-list/attachments/20060716/128fe4d7/attachment.html>


More information about the Python-list mailing list