Regular Expression problem

Paul McGuire ptmcg at austin.rr.com
Fri Jul 14 02:12:05 EDT 2006


Pyparsing is also good for recognizing basic HTML tags and their
attributes, regardless of the order of the attributes.

-- Paul

testText = """sldkjflsa;faj

<link href="mystylesheet.css" rel="stylesheet" type="text/css">

here it would be 'mystylesheet.css'. I used the following regex to get
this value(I dont know if it

I thought I was doing fine until I got stuck by this tag >>

<link rel="stylesheet" href="mystylesheet.css" type="text/css">  : same

tag but with 'href=' part

tags are like these? >>

<link rel="stylesheet" href="mystylesheet.css" type="text/css">
-OR-
<link href="mystylesheet.css" rel="stylesheet" type="text/css">
-OR-
<link type="text/css" href="mystylesheet.css" rel="stylesheet">

"""
from pyparsing import makeHTMLTags,line

linkTag = makeHTMLTags("link")[0]
for toks,s,e in linkTag.scanString(testText):
    print toks.href
    print line(s,testText)
    print

Prints out:

mystylesheet.css
<link href="mystylesheet.css" rel="stylesheet" type="text/css">

mystylesheet.css
<link rel="stylesheet" href="mystylesheet.css" type="text/css">  : same


mystylesheet.css
<link rel="stylesheet" href="mystylesheet.css" type="text/css">

mystylesheet.css
<link href="mystylesheet.css" rel="stylesheet" type="text/css">

mystylesheet.css
<link type="text/css" href="mystylesheet.css" rel="stylesheet">




More information about the Python-list mailing list