pythojn/xpath question...

Mon Feb 16 14:03:45 EST 2009

bruce schrieb:
> hi...
> 
> using libxml2dom as the xpath lib
> 
> i've got a situation where i can have:
>  foo=a.xpath( /html/body/table[2]/tr[45]/td)
> and i can get
>  11 as the number of returned td elements for the 45th row...
> 
> this is as it should be.
> 
> however, if i do:
>  foo=a.xpath( /html/body/table[2]/tr)
> 
> and then try to iterate through to the 45th "tr", and try to get the number
> of "td" elements..
> i can't seem to get the additional xpath that has to be used,
> 
> i've tried a number of the following with no luck...
>   l1 = libxml2dom.toString(tmp_[0])
>   print "l1 = "+l1+"\n"
> 
>   ldx = 0
>   for l in tmp_:
>     print "ld ="+str(ldx)
>     if ldx==45:
>       #needs to be a better way...
>       #l1 = libxml2dom.toString(tmp_[0])
>       l1 = libxml2dom.toString(l)
>       #print "1111 = ",l1
> 
>       q1 = libxml2dom
>       b1 = q1.parseString(l1, html=1)
>       #dd1 = b1.xpath("//td[not(@width)]")
>       #data = b1.xpath("//td/font")
>       #data = b1.xpath("//td[@valign='top'][not(@width)]")
>       #data =
> b1.xpath("//child::td[position()>0][@valign='top'][not(@width)]")
>       #data = b1.xpath("//td/parent::*/td[@valign='top'][not(@width)]")
>       #data = b1.xpath("//td[position()]")
>       #data = b1.xpath("//parent::tr[position()=1]/td")
>       data = b1.xpath("//td[@valign='top'][not(@width)]")
> 
> 
> it appears that i somehow need to get the direct child/node of the parent
> "tr" that's the "td"...
> it looks like using ("//td..." gets all the underlying child "td"... as
> opposed to the direct
> 1st level child/siblings... any thoughts/pointers would be appreciated...

  - you don't give enough information, as you don't provide the html
  - the above code is obviously not the one running, as I can't see 
anything that's increasing your running variable ldx
  - using l as variable names is extremely confusing, because it's hard 
to distinguish from 1 (the number). Using l1 is even worse.
  - xpath usually counts from 1, whereas python is 0-based. As is your 
code. So you most probably have a off-by-one-error.
  - you should read a xpath-tutorial, as "//td"'s purpose is to fetch 
*all*  elements td from the document root, as it is clearly stated here: 
http://www.w3.org/TR/xpath#path-abbrev. So it's no wonder you get more 
than you expect. Direct child nodes are found by simply omitting the 
axis specifier.

Diez