Parsing HTML
mtuller
mituller at gmail.com
Thu Feb 8 14:38:14 EST 2007
I am trying to parse a webpage and extract information. I am trying to
use pyparser. Here is what I have:
from pyparsing import *
import urllib
# define basic text pattern
spanStart = Literal('<span class=\"hpPageText\">')
spanEnd = Literal('</span></td>')
printCount = spanStart + SkipTo(spanEnd) + spanEnd
# get printer addresses
printerURL = "http://printer.mydomain.com/hp/device/this.LCDispatcher?
nav=hp.Usage"
printerListPage = urllib.urlopen(printerURL)
printerListHTML = printerListPage.read()
printerListPage.close
for srvrtokens,startloc,endloc in
printCount.scanString(printerListHTML): print srvrtokens
print printCount
I have the last print statement to check what is being sent because I
am getting nothing back. What it sends is:
{"<span class="hpPageText">" SkipTo:("</span></td>") "</span></td>"}
If I pull out the "hpPageText" I get results back, but more than what
I want. I know it has something to do with escaping the quotation
marks, but I am puzzled as to how to do it.
Thanks,
Mike
More information about the Python-list
mailing list