lxml to parse html
Peter Otten
__peter__ at web.de
Mon Jan 23 03:41:23 EST 2012
contro opinion wrote:
> import lxml.html
> myxml='''
> <cooperate>
> <job DecreaseHour="1" table="tpa_radio_sum">
> </job>
>
> <job DecreaseHour="2" table="tpa_radio_sum">
> </job>
>
>
> <job DecreaseHour="3" table="tpa_radio_sum">
> </job>
> </cooperate>
> '''
> root=lxml.html.fromstring(myxml)
> nodes1=root.xpath('//job[@DecreaseHour="1"]')
> nodes2=root.xpath('//job[@table="tpa_radio_sum"]')
> print "nodes1=",nodes1
> print "nodes2=",nodes2
>
>
>>>>
> nodes1= []
> nodes2= [<Element job at 0x1241240>, <Element job at 0x1362690>, <Element
> job at 0x13626c0>]
>
> would you mind to tell me why nodes1=[]??
Try
nodes1 = root.xpath('//job[@decreasehour="1"]')
xpath seems to be case-sensitive and the html parser converts to lowercase:
>>> lxml.html.fromstring("<JOB/>").tag
'job'
More information about the Python-list
mailing list