python mechanize/libxml2dom question

Paul Boddie paul at boddie.org.uk
Tue Sep 2 04:52:41 EDT 2008


On 2 Sep, 05:35, "bruce" <bedoug... at earthlink.net> wrote:
>
> i've got the following situation, with the following test url:
> "http://schedule.psu.edu/soc/fall/Alloz/a-c/acctg.html#".
>
> i can generate a list of the tables i want for the courses on the page.
> however, when i try to create the xpath query, and plug it into the xpath
> within python, i'm missing something. if i have a parent xpath query, that
> generates a list of results/nodes... how can i then use the individual
> parent node, and trigger off of it, to get further information.

You can always use the parentNode property on the nodes you get as
results from the XPath query, but I guess what you want to do is to
"rewind" and issue queries relative to some ancestor of the result
nodes.

[...]

> # **** course names
>
> cpath='//table[position()>0]/descendant::td[position()=2][@width="85%"]/../td[1]/font/a[2]/text()'

This obviously gets you right down to the hyperlink text within a part
of the table. However, it may be easier to break this query up in
order to get a more manageable overview of the process. My
understanding of the above query is that it can first be rewritten as
the following:

cpath = "//table//td[position()=2 and @width='85%']/../td[1]/font/a[2]/
text()"

Or even this:

cpath = "//table[.//td[position()=2 and @width='85%']]//td[1]/font/
a[2]/text()"

But what you could do is to obtain the important tables first:

tables = d.xpath("//table[.//td[position()=2 and @width='85%']]")

Here, we use the bracketed term to ensure that the table is the right
one, but we don't actually descend inside the table.

You could, from this, get the name by doing a query from each of these
tables:

for table in tables:
    cnames = table.xpath(".//td[1]/font/a[2]/text()") # list of text
nodes

You might want to consider a slightly safer approach when getting the
text:

    cnames = table.xpath(".//td[1]/font/a[2]") # list of nodes, should
be one
    name = cnames[0].textContent # all the text from the link

When looking for the details, you can then write your query relative
to these tables, rather than having to figure out the location of the
details from the text nodes you've just extracted.

    details = table.xpath("following-sibling::table[1]") # list of max
1 node

> i'm assuming that there's a libxml2node method that will do what i need that
> i'm missing...

You should be able to issue XPath queries from any node. There have
been issues with libxml2dom and attribute nodes obtained from XPath,
but these were fixed in recent changesets.

Paul



More information about the Python-list mailing list