How can I exclude a word by using re?
Paul McGuire
ptmcg at austin.rr.com
Tue Aug 16 09:18:28 EDT 2005
Just as with re you were using "?P<xxx>" to assign the matching text to
the variable "xxx", pyparsing allows you to associate a name with an
element of your grammar using setResultsName.
Here is your original re:
r=re.compile(ur'valign=top>(?P<number>\d{1,2})</td><td[^>]*>\s{0,2}'
ur'<a href="(?P<url>[^<>]+\.mp3)"( )target=_blank>'
ur'(?P<name>.+)</td>',re.UNICODE|re.IGNORECASE)
Here is the pyparsing expression:
valign + number.setResultsName("number") + tdEnd + \
tdStart + SkipTo(aStart) + aStart + \
SkipTo(tdEnd) + tdEnd
Here are the re and pyparsing pieces side by side:
re => pyparsing
-----------------------
valign=top> => valign = CaselessLiteral("valign=top>")
(?P<number>\d{1,2}) => number = Word(nums),
number.setResultsName("number")
</td> => tdEnd
<td[^>]*> => tdStart
\s{0,2} => I don't know what this re does, so I just used
SkipTo(aStart)
<a href="(?P<url>[^<>]+\.mp3)"( )target=_blank> => aStart (which
returns a value whose named attributes correspond to the HTML
attributes, such as href)
(?P<name>.+) => SkipTo(tdEnd) *** here is where we'll make our
change ***
</td> => tdEnd
To capture the body of the second <td></td> tag pair, we'll add
setResultsName("name") to the pyparsing expression:
mp3entry = valign + number.setResultsName("number") + tdEnd + \
tdStart + SkipTo(aStart) + aStart + \
SkipTo(tdEnd)setResultsName("name") + tdEnd
Now you should be able to extract the data using:
for toks,s,e in mp3Entry.scanString(targetHTML):
print toks.number, toks.starta.href, toks.name
Good luck!
-- Paul
More information about the Python-list
mailing list