lxml to parse html

Stefan Behnel stefan_ml at behnel.de
Mon Jan 23 03:56:11 EST 2012


contro opinion, 23.01.2012 08:34:
>     import lxml.html
>     myxml='''
>     <cooperate>
>         <job DecreaseHour="1" table="tpa_radio_sum">
>         </job>
> 
>         <job DecreaseHour="2"
> table="tpa_radio_sum">
>         </job>
> 
> 
>         <job DecreaseHour="3" table="tpa_radio_sum">
>         </job>
>     </cooperate>
>     '''
>     root=lxml.html.fromstring(myxml)
>     nodes1=root.xpath('//job[@DecreaseHour="1"]')
>     nodes2=root.xpath('//job[@ne_type="101"]')
>     print "nodes1=",nodes1
>     print "nodes2=",nodes2
> 
> what i get is:
> nodes1=[]  and
> nodes2=[<Element job at 0x13636f0>]
> why  nodes1  is  []?nodes2=[<Element job at 0x13636f0>],

Not on my side. I get two empty lists.


> it is so strange thing?why ?

The really strange thing that I don't understand is why you would use an
HTML parser to parse an XML document. You should use lxml.etree instead.

Stefan




More information about the Python-list mailing list