Regular Expression problem
Paul McGuire
ptmcg at austin.rr.com
Fri Jul 14 02:12:05 EDT 2006
Pyparsing is also good for recognizing basic HTML tags and their
attributes, regardless of the order of the attributes.
-- Paul
testText = """sldkjflsa;faj
<link href="mystylesheet.css" rel="stylesheet" type="text/css">
here it would be 'mystylesheet.css'. I used the following regex to get
this value(I dont know if it
I thought I was doing fine until I got stuck by this tag >>
<link rel="stylesheet" href="mystylesheet.css" type="text/css"> : same
tag but with 'href=' part
tags are like these? >>
<link rel="stylesheet" href="mystylesheet.css" type="text/css">
-OR-
<link href="mystylesheet.css" rel="stylesheet" type="text/css">
-OR-
<link type="text/css" href="mystylesheet.css" rel="stylesheet">
"""
from pyparsing import makeHTMLTags,line
linkTag = makeHTMLTags("link")[0]
for toks,s,e in linkTag.scanString(testText):
print toks.href
print line(s,testText)
print
Prints out:
mystylesheet.css
<link href="mystylesheet.css" rel="stylesheet" type="text/css">
mystylesheet.css
<link rel="stylesheet" href="mystylesheet.css" type="text/css"> : same
mystylesheet.css
<link rel="stylesheet" href="mystylesheet.css" type="text/css">
mystylesheet.css
<link href="mystylesheet.css" rel="stylesheet" type="text/css">
mystylesheet.css
<link type="text/css" href="mystylesheet.css" rel="stylesheet">
More information about the Python-list
mailing list