How can I exclude a word by using re?

could ildg could.net at gmail.com
Tue Aug 16 01:42:38 EDT 2005


Thank you, 
you code using pyparsing works very well. Now I got the "number" and
the "url". But I still want to get the "name".
I'll turn to pyparsing and see how to get the "name" from the html.
But I hope you can enlighten me for one more time since I'm not
farmiliar with the pyparsing module.

On 15 Aug 2005 21:15:02 -0700, Paul McGuire <ptmcg at austin.rr.com> wrote:
> Given the example re that you've been trying to get working, here is a
> pyparsing approach that might be more, um, approachable.
> Unfortunately, since I don't have the URL of the page you are working
> with, I'm unable to test this before posting.
> 
> Good luck,
> -- Paul
> 
> # getMP3s.py
> # get pyparsing at http://pyparsing.sourceforge.net
> #
> 
> from pyparsing import *
> import urllib
> 
> #~
> r=re.compile(ur'valign=top>(?P­<number>\d{1,2})</td><td[^>]*>­\s{0,2}'
> 
> #~ ur'<a href="(?P<url>[^<>]+\.mp3)"( )target=_blank>'
> #~ ur'(?P<name>.+)</td>',re.UNICO­DE|re.IGNORECASE)
> 
> tdStart,tdEnd = makeHTMLTags("td")
> aStart,aEnd = makeHTMLTags("a")
> 
> number = Word(nums)
> valign = CaselessLiteral("valign=top>")
> 
> mp3Entry = valign + number.setResultsName("number") + tdEnd + \
>             tdStart + SkipTo(aStart) + aStart + \
>             SkipTo(tdEnd) + tdEnd
> 
> # get list of mp3's
> targetURL = "http://whatever"
> targetPage = urllib.urlopen( targetURL )
> targetHTML = targetPage.read()
> targetPage.close()
> 
> for toks,s,e in mp3Entry.scanString(targetHTML):
>     print toks.number, toks.starta.href
> 
> --
> http://mail.python.org/mailman/listinfo/python-list
>



More information about the Python-list mailing list