How can I exclude a word by using re?

Tue Aug 16 09:18:28 EDT 2005

Just as with re you were using "?P<xxx>" to assign the matching text to
the variable "xxx", pyparsing allows you to associate a name with an
element of your grammar using setResultsName.

Here is your original re:
r=re.compile(ur'valign=top>(?P<number>\d{1,2})</td><td[^>]*>\s{0,2}'
 ur'<a href="(?P<url>[^<>]+\.mp3)"( )target=_blank>'
 ur'(?P<name>.+)</td>',re.UNICODE|re.IGNORECASE)

Here is the pyparsing expression:
valign + number.setResultsName("number") + tdEnd + \
            tdStart + SkipTo(aStart) + aStart + \
            SkipTo(tdEnd) + tdEnd

Here are the re and pyparsing pieces side by side:
re => pyparsing
-----------------------
valign=top>    =>  valign = CaselessLiteral("valign=top>")
(?P<number>\d{1,2})    =>    number = Word(nums),
number.setResultsName("number")
</td>       =>    tdEnd
<td[^>]*>    =>   tdStart
\s{0,2}       =>  I don't know what this re does, so I just used
SkipTo(aStart)
<a href="(?P<url>[^<>]+\.mp3)"( )target=_blank>     =>  aStart (which
returns a value whose named attributes correspond to the HTML
attributes, such as href)
(?P<name>.+)   =>   SkipTo(tdEnd)  *** here is where we'll make our
change ***
</td>    =>  tdEnd

To capture the body of the second <td></td> tag pair, we'll add
setResultsName("name") to the pyparsing expression:
mp3entry = valign + number.setResultsName("number") + tdEnd + \
            tdStart + SkipTo(aStart) + aStart + \
            SkipTo(tdEnd)setResultsName("name") + tdEnd

Now you should be able to extract the data using:
for toks,s,e in mp3Entry.scanString(targetHTML):
    print toks.number, toks.starta.href, toks.name

Good luck!
-- Paul