[XML-SIG] Extracting info from XHTML with Xpath

Wed Mar 24 17:58:09 EST 2004

	Asunto: [XML-SIG] Extracting info from XHTML with Xpath
	Fecha: mié, mar 24, 2004 at 03:58:07 -0600


Citando a  Tim Wilson (wilson at visi.com):
> Hi everyone,
> 
> I'm going to be teaching a course on building Web pages with Web standards
> and I thought it would be fun to show a little demo of a python script that
> could extract information from an XHTML document. I found Simon Willison's
> description of using Xpath and Python, but I haven't had any luck getting an
> Xpath expression that works.
> 
> I've got a Web page at
> 
> http://www.hopkins.k12.mn.us/Pages/district/special/pq/timelytopics.html
> 
> that lists a bunch of upcoming tech classes in our school district. I'd like
> to extract the coursetitles and dates.
> 
> Would anyone be willing to have a quick look at the source for that page and
> suggest a way to address the <h3 class="coursetitle"> and <p class="date">
> information?
>

Perhaps

from xml.dom.ext.reader import PyExpat
from xml.path import Evaluate
from xml.dom.ext import PrettyPrint

path0 = '//h3[@class="coursetitle"]'

reader = PyExpat.Reader()
dom = reader.fromUri('http://www.hopkins.k12.mn.us/Pages/district/special/pq/timelytopics.html')

myElements = Evaluate(path0, dom.documentElement)
for element in myElements:
    PrettyPrint(element)


-- lm