Parsing HTML

Paul McGuire ptmcg at austin.rr.com
Thu Feb 8 17:45:37 EST 2007


On Feb 8, 4:15 pm, "mtuller" <mitul... at gmail.com> wrote:
> I was asking how to escape the quotation marks. I have everything
> working in pyparser except for that. I don't want to drop everything
> and go to a different parser.
>
> Can someone else help?
>
>
Mike -

pyparsing includes a helper for constructing HTML tags called
makeHTMLTags.  This method does more than just wrap the given tag text
within <>'s, but also comprehends attributes, upper/lower case, and
various styles of quoted strings.  To use it, replace your Literal
definitions for spanStart and spanEnd with:

spanStart, spanEnd = makeHTMLTags('span')

If you don't want to match just *any* <span> tag, but say, you only
want those with the class = "hpPageText", then add this parse action
to spanStart:

def onlyAcceptWithTagAttr(attrname,attrval):
    def action(tagAttrs):
        if not(attrname in tagAttrs and tagAttrs[attrname]==attrval):
            raise ParseException("",0,"")
    return action

spanStart.setParseAction(onlyAcceptWithTagAttr("class","hpPageText"))


-- Paul





More information about the Python-list mailing list