Parsing HTML
Paul McGuire
ptmcg at austin.rr.com
Thu Feb 8 17:45:37 EST 2007
On Feb 8, 4:15 pm, "mtuller" <mitul... at gmail.com> wrote:
> I was asking how to escape the quotation marks. I have everything
> working in pyparser except for that. I don't want to drop everything
> and go to a different parser.
>
> Can someone else help?
>
>
Mike -
pyparsing includes a helper for constructing HTML tags called
makeHTMLTags. This method does more than just wrap the given tag text
within <>'s, but also comprehends attributes, upper/lower case, and
various styles of quoted strings. To use it, replace your Literal
definitions for spanStart and spanEnd with:
spanStart, spanEnd = makeHTMLTags('span')
If you don't want to match just *any* <span> tag, but say, you only
want those with the class = "hpPageText", then add this parse action
to spanStart:
def onlyAcceptWithTagAttr(attrname,attrval):
def action(tagAttrs):
if not(attrname in tagAttrs and tagAttrs[attrname]==attrval):
raise ParseException("",0,"")
return action
spanStart.setParseAction(onlyAcceptWithTagAttr("class","hpPageText"))
-- Paul
More information about the Python-list
mailing list