lxml to parse html

Mon Jan 23 03:41:23 EST 2012

contro opinion wrote:

> import lxml.html
> myxml='''
> <cooperate>
> <job DecreaseHour="1" table="tpa_radio_sum">
> </job>
> 
> <job DecreaseHour="2" table="tpa_radio_sum">
> </job>
> 
> 
> <job DecreaseHour="3" table="tpa_radio_sum">
> </job>
> </cooperate>
> '''
> root=lxml.html.fromstring(myxml)
> nodes1=root.xpath('//job[@DecreaseHour="1"]')
> nodes2=root.xpath('//job[@table="tpa_radio_sum"]')
> print "nodes1=",nodes1
> print "nodes2=",nodes2
> 
> 
>>>>
> nodes1= []
> nodes2= [<Element job at 0x1241240>, <Element job at 0x1362690>, <Element
> job at 0x13626c0>]
> 
> would you mind to tell me  why nodes1=[]??

Try

nodes1 = root.xpath('//job[@decreasehour="1"]')

xpath seems to be case-sensitive and the html parser converts to lowercase:

>>> lxml.html.fromstring("<JOB/>").tag
'job'